Boost LLM Accuracy: Manifest Changes Case Study

Scott Hall

5 months ago

2 min read

article

Boost LLM Accuracy: Manifest Changes Case Study

Case Study: How a Single Manifest Change Boosted LLM Accuracy from 33% to 83% in a Key Use Category

The Challenge

We were testing a simple MCP todo server with three tools: createTodo, listTodos, and a completion tool. The original completion tool was named completeTodo with the bare description: “Mark a todo item as completed.”

In testing, LLMs routinely failed on prompts like “finish,” “done,” or “tick that box” — what we call these ambiguous prompts. Accuracy for the completion tool hovered around 32% in these use cases.

The Change

We updated the manifest:

Renamed the tool from completeTodo to markTodoComplete
Expanded the description with natural synonyms (“finish, done, tick, check off, mark”), explicit ID requirements, and a workflow hint (“use listTodos first if you need to find IDs”). This provided insight to the LMM how this tool can fit into a workflow.

The Results

Before: Tool accuracy (completeTodo )~33% for completion tasks
After: Tool accuracy (markTodoComplete )~83%
Overall suite accuracy: Jumped to 71%+ across tested LLMs

Testing

The MCPAlign test suite included prompts across categories:

Golden questions: Clear, direct asks, the sweet spot what we hope our users type
Embedded questions: Requests embedded inside longer strings of text but that are clear and compleate in the use case
Ambiguous questions: Underspecified or unclear phrasing, simulating real world usecase and how users really interact
Multilingual: Spanish/French, does is the manifect language dependent for clarity
Distractors: Irrelevant or off-topic prompts, just noise and looking for accidenteial tool calls

In these tests the Golden, multilingual, and distractor questions all scored very high. The only weak spot was around ambiguous prompts. This is where the user really lives, and is the reality of working with users. Real people dont saythings they way we as deveopers hope they do.

The Insight

The results improved. This wasn’t a model upgrade. The exact same LLMs suddenly appeared smarter because the manifest provided better semantic positioning for the tool. Tool naming and description quality directly influence MCP sucess.

Takeaway

AI performance isn’t just about the model — it’s about how we communicate with the model through manifests. Good names and rich descriptions make tools usable.

Reading time: 2 min read

Published: 9/25/2025