Field notes from the floor.
A weekly recap that recomputes itself, plus deeper guides and comparisons against the same live data that drives the indexes.
- Weekly reportMay 25, 2026·AgentTapeThis week on AgentTape
The seven-day recap, recomputed every visit: biggest movers, new admissions, hot sectors. Auto-generated from the live index.
- May 25, 2026Anatomy of the SLM turn — what 60 days of model releases just made empirically true
Gemma 3 27B scored 6.6% on τ2-bench. Gemma 4 31B scores 86.4%. Qwen3.6-27B now outperforms Qwen3.5-397B on SWE-bench. The position paper NVIDIA published in June 2025 was right. The architecture that benefits from being right doesn't ship from the labs that built the current stack.
- May 24, 2026Anatomy of the MCP security crisis — eight weeks that moved agent risk from emerging to national security
On April 25 a Cursor agent running Claude Opus 4.6 deleted a SaaS company's entire production database in nine seconds. Six days earlier, OX Security had published a protocol-level flaw in Anthropic's MCP that affects 200,000 servers. Anthropic called it expected behaviour. The Five Eyes called it critical infrastructure risk. What eight weeks of incidents say about the agent stack everyone is shipping right now.
- May 16, 2026Anatomy of the LLM-agent wall — what JEPA, ARC-AGI-3 and a $1bn world-model bet say about what comes next
Frontier models scored 0.3% on ARC-AGI-3 the day it launched. Humans scored 100%. Three of the four most influential people in AI now publicly disagree with the LLM-agent thesis. The architectural case, the capital allocation, and what it means for the stack being shipped right now.
- May 10, 2026Open-source alternatives to Devin, Cursor and Claude Code
Five self-hosted AI coding agents — who runs each one, which proprietary product it actually replaces, and the specific places it falls short of the closed equivalent.
- May 9, 2026How to actually evaluate an AI agent in 2026
A working evaluation flow for agents specifically — eval set construction, the four failure modes that don't show up in demos, and the contract clauses that matter when the vendor disappears.
- May 8, 2026GitHub stars are lying to you — a better way to pick AI tools
A peer-reviewed 2026 study found six million fake GitHub stars across 15,835 repositories. Even the honest stars are pointing at the wrong thing. The signals that actually predict AI tool adoption.
- May 7, 2026The bundling apocalypse — why most standalone AI tools will be dead by 2027
Midjourney fell from the top 10 to #46. OpenAI shut down Sora after burning $15M a day for $2.1M in lifetime revenue. The map of which standalone AI tools survive the bundling pressure, and which categories get absorbed.
- May 6, 2026The best foundation models for AI agents in 2026
Eight models that ship agent workloads in production, ranked on the three failure modes humans don't notice: brittle tool calls, context that rots past 80K, and unit cost at ten million runs.
- May 5, 2026The best AI coding agents in 2026 — ranked, live
Eight tools, four trade-offs, and one honest admission: no single agent wins every job. Ranked from real benchmarks and merge rates as of May 2026.