Sort of Calculator

Ever asked AI to do simple maths and watched it confidently get the wrong answer?

That’s because LLMs aren’t actually very good calculators - they’re just really good at pretending to be. Usually, when an AI gets maths right, it’s because it’s secretly using external tools like code interpreters. But what happens when you don’t let it cheat?

In this widget, we’ve forced the model to behave like a simple calculator using only what it has learnt from its training data - no external tools, no writing and running code, and no built-in calculator.

You can try basic operations like: addition (+), subtraction (−), multiplication (×), and division (÷). Don’t be surprised if it gets things wrong, especially with bigger or more unusual numbers. You might be surprised about how good (or bad) this type of AI actually is with numbers.

Why Does This Happen?

Because LLMs don’t actually do maths. They generate text by guessing what words (or numbers) are likely to come next, based on patterns they’ve seen before.

They’ve seen “2 + 2 = 4” so many times that they’ve learnt to repeat it. But if you ask for something like “17 × 43” or “the square root of 242”, they might just guess — and guess incorrectly.

They’re not calculating. They’re predicting.

But I’ve Seen AI Get Maths Right Before…

If you’ve used ChatGPT or other AI systems and seen them solve maths problems correctly, that’s because they’re getting help behind the scenes. These systems quietly use external tools like code interpreters to actually do the calculations.

What it looks like when you ask ChatGPT a complex maths question

It’s a bit like a magician with a hidden assistant - the performance looks impressive, but there’s more going on than meets the eye.

And here’s something else to consider: efficiency. A simple Casio calculator can do maths instantly using just a tiny bit of energy, like running on a tiny solar panel. Meanwhile, an LLM uses enormous amounts of electricit to do the same thing - poorly.

What ChatGPT does behind the scenes: it writes a program

So What?

This widget shows that LLMs are not reliable calculators on their own. Understanding this helps explain why AI can seem inconsistent: it excels at pattern recognition and language tasks but struggles with precise logical operations that require step-by-step reasoning rather than pattern matching.

Reflections

When did the model get the answer right? When did it get it wrong?
What kinds of problems was it better at?
Why do you think it sounds so confident, even when it’s wrong?
When might it be risky to trust an AI with numbers?
What other kinds of tasks might work the same way, where it sounds right, but isn’t?

Recommended Learning

The Agent Illusion: When AI ‘Tools’ Mask Fundamental Limitations - Analysis of how external tools create misleading impressions of AI capability
Behind the Curtain: How AI Systems Really Work - Investigation into the hidden infrastructure that makes AI seem more capable than it is
AI’s Carbon Footprint: What Tech Companies Don’t Want You to Know - Exposé on hidden environmental costs
Why Large Language Models Fail at Basic Reasoning - Research exposing fundamental gaps in AI mathematical understanding
The Calculator Paradox: When AI Can’t Do What Your Phone Can - Analysis of AI’s surprising mathematical blindspots
Mathematical Reasoning or Pattern Matching? The AI Deception - Study showing AI doesn’t actually understand mathematics

⌂ Home

◀ Previous Next ▶

GenAI Arcade

Sort of Calculator

Sort of Calculator

Why Does This Happen?

But I’ve Seen AI Get Maths Right Before…

So What?

Reflections

Recommended Learning