Zhudongsheng75/ToolMaze: Existing benchmarks evaluate Tool-Integrated Reasoning (TIR) in LLMs on idealized ''happy paths'', largely overlooking real-world tool failures. We introduce ToolMaze, a benchmark for dynamic path discovery and error recovery in TIR agents. To separate systematic replanning fr...
Pillar = mean of 2 scaled values = 3.9.
Awaiting first reading — these signals apply to this agent and will be ingested on the next tier tick: SO questions (7d), Product Hunt upvotes, Docker Hub pulls, Crates.io downloads (90d), Tech-news mentions (30d)
Not applicable — this agent doesn't have the prerequisite (no GitHub repo, no HF mirror, etc.) for these signals to ever apply: HF downloads (30d), npm weekly installs, PyPI monthly installs
[](https://agenttape.com/agents/zhudongsheng75-toolmaze)
<a href="https://agenttape.com/agents/zhudongsheng75-toolmaze"><img src="https://agenttape.com/api/badge/zhudongsheng75-toolmaze.svg" alt="AgentTape" /></a>