Your AI agent may be doing the maths correctly and still be wrong

Gary Lloyd
May 28
2 min read

The world has gone a bit mad for AI agents.

Claude Co-work, Codex and Copilot agents, … increasingly complex workflows are being handed over to systems built on large language models.

Yet somewhere along the way, we seem to have gone strangely quiet about hallucinations and other forms of AI error.

In the early days of ChatGPT, if I analysed survey data, it was obvious that deterministic code was being written and executed. The calculations were automatically visible and reproducible.

Today, with agentic workflows, much of that is becoming hidden behind polished outputs and automation.

That means we need to think more carefully about where errors can creep in and how to mitigate them.

This diagram below uses a simple example: an AI agent analysing survey data from a spreadsheet.

The point is this: the greatest risk is not necessarily incorrect maths. The code the agent writes to achieve its goal may be perfectly sound.

The bigger danger is often in the choices made along the way:

• Misunderstanding intent

• Choosing the wrong grouping or denominator

• Skipping validation steps

• Overclaiming patterns in the data

• Producing confident interpretations that outrun the evidence

And this is a simple example. Many organisations are now building agentic workflows that are vastly more complex.

That means we need guardrails: the AI writing deterministic code where possible, visible assumptions, audit trails, robustness checks and human review at key points. ChatGPT’s own advice is to use prompts that instruct AI to write deterministic code where appropriate, rather than giving a best guess or what it thinks you want to hear.

If you read posts on here, from vendors and especially from AI “experts” on YouTube, you might think you are falling behind because you don’t have “an army or AI agents”. However, much of the hype centres on low-risk tasks, such as personal automation, such as responding to emails or creating social media content. But in an organisational setting, analyses on which you base decisions need to evaluate risks at every step of a workflow and to make them visible with clear audit trails.

Agentic workflows are here to stay, but they are not magic, so tread carefully. Analogously, a single spreadsheet error by JP Morgan once under-estimated losses by hundreds of millions of dollars and led to losses in billions (yes, honestly BILLIONS - search for “The London Whale to fact check) and triggered a US Senate Committee investigation.

Finally, don’t forget that the most significant impact of AI will come from reinventing how we meet customer needs, without being tied to the legacy of existing processes.

Comments