HN – Show HN: MCP-compress-router

When you have multiple MCP servers, every request to the LLM will include all of their tools and descriptions, which can quickly eat up your token limit and increase costs. The thing is, most of the time, you don't need all of them.

For example, let’s take three popular MCP servers: Notion, GitHub, and Pylance. The overhead they create on every turn is about 26K tokens. If we assume an average 50-turn coding session and Opus pricing, the overhead for a single session is about $0.9275.

`mcp-compress-router` does something very simple: it proxies all MCP servers with just two tools: `get_tool_schema` and `invoke_tool`. `invoke_tool` proxies the call to the downstream MCP server. The `get_tool_schema` description lists the tool names and arguments for all downstream MCP server tools so that the agent knows what's available. Whenever it needs a tool, it first calls `get_tool_schema` to read the full description and argument schema, and then calls `invoke_tool`.

The savings are pretty serious. The example of 3 MCP servers is compressed to 900 tokens with the "max" compression level (just tool names), or to about 2000 tokens with the "high" compression level (the default one: tool names plus argument names). So you'll be saving 90%+ this way.

Show HN: MCP-compress-router – MCP Compressor

0 comments