Boost LLM Accuracy: Manifest Changes Case Study

Case Study: How a Single Manifest Change Boosted LLM Accuracy from 33% to 83% in a Key Use Category
The Challenge
We were testing a simple MCP todo server with three tools: createTodo, listTodos, and a completion tool. The original completion tool was named completeTodo with the bare description: “Mark a todo item as completed.”
In testing, LLMs routinely failed on prompts like “finish,” “done,” or “tick that box” — what we call these ambiguous prompts. Accuracy for the completion tool hovered around 32% in these use cases.
The Change
We updated the manifest:
- Renamed the tool from
completeTodotomarkTodoComplete - Expanded the description with natural synonyms (“finish, done, tick, check off, mark”), explicit ID requirements, and a workflow hint (“use
listTodosfirst if you need to find IDs”). This provided insight to the LMM how this tool can fit into a workflow.
The Results
- Before: Tool accuracy (completeTodo )~33% for completion tasks
- After: Tool accuracy (markTodoComplete )~83%
- Overall suite accuracy: Jumped to 71%+ across tested LLMs
Testing
The MCPAlign test suite included prompts across categories:
- Golden questions: Clear, direct asks, the sweet spot what we hope our users type
- Embedded questions: Requests embedded inside longer strings of text but that are clear and compleate in the use case
- Ambiguous questions: Underspecified or unclear phrasing, simulating real world usecase and how users really interact
- Multilingual: Spanish/French, does is the manifect language dependent for clarity
- Distractors: Irrelevant or off-topic prompts, just noise and looking for accidenteial tool calls
In these tests the Golden, multilingual, and distractor questions all scored very high. The only weak spot was around ambiguous prompts. This is where the user really lives, and is the reality of working with users. Real people dont saythings they way we as deveopers hope they do.
The Insight
The results improved. This wasn’t a model upgrade. The exact same LLMs suddenly appeared smarter because the manifest provided better semantic positioning for the tool. Tool naming and description quality directly influence MCP sucess.
Takeaway
AI performance isn’t just about the model — it’s about how we communicate with the model through manifests. Good names and rich descriptions make tools usable.
Reading time: 2 min read
Published: 9/25/2025