We’ve all seen it happen: the LLM starts going down the wrong path and makes dozens of failed or wasted tool calls that don’t actually get it closer to its goal.

Even though models can self-correct and find a new path on a subsequent retry, self-correction can hide repeated failures that make the agent slower, more expensive and more difficult to evaluate across the dataset. In this post, I look at what wasted tool calls do to the trace, when retries are required and when they’re avoidable, and the cost of avoidable failures in practice.

Need the workflow? See my step-by-step guide for debugging wasted tool calls in LLM logs.

Key takeaways

  • Failed tool calls add cost, latency, noise and weaker prompt signal.
  • A wasted tool call makes the trace noisier even when the next run succeeds.
  • Required retried tool calls help the agent retrieve information it didn’t have.
  • Avoidable retries usually stem from prompts that are inaccurate or not specific enough.
  • If the model keeps self-correcting and eventually finds a new path, the trace can look healthier than the prompt really is.

A wasted tool call produces a noisy trace

Even when the tool call eventually finds a new path, it doesn’t erase the earlier failure(s). The trace still shows the failed calls and the subsequent successful one. This makes it harder to review the run and determine whether the agent reached the result efficiently. In a production dataset, that pattern can repeat across many traces, even when the final outputs look fine.

Required retries vs. avoidable retries

Some tool-call failures calls are required because the agent has to make the call to retrieve information it doesn’t have. If a file lookup fails because the file doesn’t exist, that failure may still be useful because it tells the agent something it needed to know. A path check can work the same way. Other calls are genuine failures because the agent queried the wrong file, took a dead-end path or went down a rabbit hole. The line between these cases can be fuzzy.

Other failed tool calls are avoidable. These occur when the prompt is inaccurate or not specific enough. Think of incorrect parameters, wrong attribute names on objects and shell syntax issues such as unescaped pipe characters.

The hidden cost of failed tool calls in agent traces

Wasted tool calls that are avoidable hide costs that are easy to miss if you only look at whether the run succeeded:

  • More cost: more calls mean more tokens, more compute and more spend
  • More latency: retries slow the agent down even when the run succeeds
  • More trace noise: the extra failure and retry make the trace harder to review
  • More unhelpful work: the model may recover on a subsequent attempt, but it still spends extra calls on a path that was never going to help
  • Weaker prompt signal: recovery masks prompt defects, so a successful run is a weaker indicator of whether the prompt is doing what you think it is

For my step-by-step workflow, see how to debug wasted tool calls in LLM logs.

FAQ

Why are failed tool calls in LLM logs worth debugging?

Failed tool calls in LLM logs are worth debugging because although the model may correct itself and find a new path on a subsequent attempt, the retry still adds cost, latency and noise to the trace. The failure can also point to a prompt issue that keeps repeating across the dataset.

Why does a retried tool call produce a noisy trace?

A retried tool call produces a noisy trace because the successful call doesn’t erase the failed one that came before it even though the run eventually succeeds. This makes reviewing the trace and diagnosing issues more difficult.

What’s the difference between wasted tool calls that are required and ones that are avoidable?

Some failed tool calls are required because the failure provides the model with new information, for example, that a file doesn’t exist or a path is wrong. Avoidable retries happen when the prompt is inaccurate or not specific enough, which leads to underspecified tool calls.

When a wasted tool call is avoidable, what does that usually indicate?

When a failed tool call is avoidable, the prompt usually needs to be updated. The unnecessary noise in the trace weakens the signal on whether the prompt is well specified.