Skip to content

AI for Science Benchmark Leaderboard

How well do AI agents perform on real scientific tasks? We evaluate AI models on benchmarks covering a wide range of scientific domains and task types, and publish the results here.

Preliminary results

The results on this page are preliminary, covering a small set of models and benchmarks. We plan to expand both โ€” adding more models, AI agents, and benchmarks โ€” toward comprehensive evaluation of AI for science.


Overall Leaderboard

Models ranked by average score across all benchmarks.

View Detailed Leaderboard โ†’


Explore

๐Ÿ“Š Leaderboard

Complete rankings with per-benchmark scores

๐Ÿงช Benchmarks

What each benchmark measures, with detailed results

๐Ÿ’ฐ Cost Analysis

API cost comparison across models

๐Ÿš€ Participate

Submit your model for evaluation