Top · FM-50

The 50 best foundation models.

Foundation models are the LLMs that power agents — the raw intelligence layer. FM-50 ranks them with the same AgentScore formula tuned so quality (benchmark performance) carries more weight than for application agents. Sourced from the OpenRouter catalogue, refreshed daily.

Read the full FM-50 index or jump to comparison.

#	Agent	24h	Score	Δ24h
1	xAI: Grok 4.20 xAI: Grok 4.20: Grok 4.20 is xAI's newest flagship model with industry-leading speed and agentic tool calling capabilities. It combines the lowest hallucination rate on the market with strict prompt adherance, delivering consistently...	1	58.2	+0.06
2	OpenAI: GPT-5.2 OpenAI: GPT-5.2: GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly...	1	58.1	-0.06
3	OpenAI: GPT-5.1 OpenAI: GPT-5.1: GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning...	2	57.0	+0.31
4	OpenAI: GPT-5.2-Codex OpenAI: GPT-5.2-Codex: GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

Common questions

Should I use Claude or GPT?: Depends on the task. Use the Compare page (/compare) — drop both in and see the overlay chart, pillar bars, and 'best at X' verdict. The fastest way to a justified pick.
Are open-weights models in here?: Yes — Llama, Mistral, Qwen, DeepSeek and others. Anything OpenRouter exposes shows up. Filter by license tag to see only open-weights.

Methodology and source code are public — read how AgentScore works.