Top · FM-50
The 50 best foundation models.
Foundation models are the LLMs that power agents — the raw intelligence layer. FM-50 ranks them with the same AgentScore formula tuned so quality (benchmark performance) carries more weight than for application agents. Sourced from the OpenRouter catalogue, refreshed daily.
Read the full FM-50 index or jump to comparison.
| # | Agent | 24h | Score | Δ24h |
|---|---|---|---|---|
| 1 | xAI: Grok 4.20 xAI: Grok 4.20: Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently... | 1 | 58.2 | +0.06 |
| 2 | OpenAI: GPT-5.2 OpenAI: GPT-5.2: GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly... | 1 | 58.1 | -0.06 |
| 3 | OpenAI: GPT-5.1 OpenAI: GPT-5.1: GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning... | 2 | 57.0 | +0.31 |
| 4 | OpenAI: GPT-5.2-Codex OpenAI: GPT-5.2-Codex: GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks.... |
Common questions
- Should I use Claude or GPT?
- Depends on the task. Use the Compare page (/compare) — drop both in and see the overlay chart, pillar bars, and 'best at X' verdict. The fastest way to a justified pick.
- Are open-weights models in here?
- Yes — Llama, Mistral, Qwen, DeepSeek and others. Anything OpenRouter exposes shows up. Filter by license tag to see only open-weights.
Methodology and source code are public — read how AgentScore works.