MCP solved a real problem: how does an AI agent access external tools? But it created a new one. Because more tools doesn’t mean more capability. Past a certain point, it means less.
The problem in numbers
A single MCP server with 135 tools consumes roughly 126,000 tokens just for tool descriptions, before the agent does anything. With a 200K context window, that leaves less than 40% for actual work.
That would be acceptable if agents delivered better results with more tools. They don’t. Redis measured it: with 50+ available tools, selection accuracy was 42%. Filtered down to the 3-5 relevant ones, it jumped to 85%. Token usage dropped from 23,000 to 400. Latency from 3.4 seconds to 0.4.
This isn’t an isolated case. Chroma Research tested 18 LLMs and found the same pattern across all of them: more context, worse results. A packed context window at ~113K tokens pushed accuracy down by roughly 30% compared to a focused 300-token version. The model doesn’t get smarter when you feed it more. It gets more distracted.
What MCP doesn’t solve
MCP defines how tools are provided and called. It’s an infrastructure layer, comparable to a REST API specification. But MCP doesn’t know about workflows. It doesn’t know that tool B should only be called after tool A. It has no concept of dependencies, sequences, or fallback strategies.
Consider a typical API management scenario. You have an MCP server connected to a platform like Gravitee. Every API operation becomes an MCP tool. A platform with 80+ endpoints means 80+ tool descriptions in context. The agent sees: createApi, updateApi, deployApi, createPlan, publishPlan, createApplication, subscribeApplication, and 73 more. All flat, all equal weight.
When a developer says “Create a new API with rate limiting”, the agent has to assemble the right sequence from 80 tools: create API, create plan, publish plan, deploy API. In that order. Without orchestration, it guesses. And mostly guesses wrong, because 80 equally-weighted tool descriptions don’t communicate sequence.
The pattern: progressive disclosure
Multiple tools converged on the same solution independently.
Claude Code introduced a skills system. Skills load only their metadata (~30-50 tokens), the rest comes on demand. Instead of keeping all tools permanently in context, they’re filtered by intent.
Cursor migrated from monolithic .cursorrules to modular .mdc files. Rules are loaded via glob patterns only when they match the current file. Working on app/models/? You get the database conventions. Working on the frontend? You don’t.
GitHub Copilot uses path-specific .instructions.md files with YAML frontmatter. An applyTo property controls which instructions apply to which directories.
The pattern is always the same: don’t load everything, load the right thing at the right time. The agent first gets a compact overview (what skills exist?), recognizes the intent of the request, then selectively loads the matching skill — including the necessary workflow steps and reference data.
Back to the API management example: instead of 80 tool descriptions permanently in context, a skill “API Management” defines a routing table. “Create an API” matches a workflow that describes the 4-5 relevant tools in the right order and provides the necessary reference data (target environment, naming conventions, default rate limits). Context shrinks from ~80,000 tokens to ~5,000. The agent knows what to do.
Why this matters now
Context windows are growing. Both Gemini and Claude now offer 1M tokens. You might think the problem solves itself. It doesn’t. The Chroma study shows: more context only helps when it’s relevant. Irrelevant context actively hurts, regardless of window size. Or as Anthropic puts it: context is a finite resource with diminishing returns.
Jenova AI puts the practical threshold at 5-7 tools per interaction. Beyond that, selection errors increase exponentially.
There’s a second trend reinforcing this: agentic architectures increasingly rely on specialized agents for individual tasks rather than one general-purpose agent that does everything. And specialized agents often run on smaller, faster models that don’t come with 1M-token windows. An agent on Haiku or a similarly lean model might have 32K-128K of context. Every token consumed by tool descriptions counts.
What this means in practice
If you’re building AI agents or running MCP servers, spend as much time thinking about tool routing as you do about tool availability. An MCP server offering 100 tools isn’t a feature. It’s a performance problem waiting to surface.
The rule of thumb: if your agent sees more than 10 tools for a given request, you need an orchestration layer. Not because MCP is bad — MCP is the right infrastructure. But infrastructure alone isn’t enough. You need intelligence on top.