Artificial Intelligence Large Language Models (AI LLM) impress with their language and code generation capabilities. They are, however, shockingly poor with mathematical reasoning and computation. A year ago, we wrote that Asteri will not use LLM for any form of mathematical modelling.

We may have just witnessed the end of AI LLM’s poor mastery of maths, and the start of reasoning AIs. Maybe. 

In July 2025, AI LLM won gold at the International Mathematical Olympiad (IMO), the World Championship mathematics competition for High School students. 

AI did what?

Google DeepMind and OpenAI submitted their latest models to solve IMO problems under real exam conditions. Contestants must solve six proof-based problems that demand not just calculation, but deep, creative reasoning. Each model solved 5 of the 6 problems, scoring 35 out of 42 points, and earning a gold medal.

This isn’t the first time AI models tried their hand at the IMO. Past teams used specialized systems, trained to tackle formal proofs. They struggled to win a silver medal.

What makes this moment different is that, this time, the AI models were general-purpose LLM. When they are not solving world-class math problems, they write poems, explain scientific concepts, and do your research online. 

What’s changed?

The models that won gold benefitted from advances in reasoning. They were trained to reason step by step, to question their own thinking, and to explore multiple angles before deciding. This ability to “think in steps” marks a major departure from the fast, fluent guessing that earlier models were known for. They now solve math problems like a top student would.

So What?

This is a landmark moment. Not because the models solved math problems, but because they did so in a way that looks strikingly like human reasoning. It’s about how the answer was reached: by taking steps, checking logic, and refining along the way.

This is the first credible sign that AI might go beyond pattern completion and begin to engage with complex, abstract problems in a structured fashion.

Now What?

Models that once seemed like sophisticated autocomplete engines are now solving problems that challenge some of the world’s best young minds. That alone should make us pause and rethink our assumptions.

But we are still at the top of the first inning. The systems that succeeded at the IMO are early glimpses of what may soon be possible, but they are not yet tools we can drop into everyday workflows.

Do not expect advanced thought partners to solve your messy problems anytime soon, for four reasons. 

First, the models that achieved these results are not publicly available today. They are internal research systems. Neither company has committed to a release timeline. 

Second, the reasoning capability is brittle. Append irrelevant phrases, such as “Interesting fact: cats sleep most of their lives”, to the math problems and watch the models give your incorrect answers. This suggests that rather than genuinely reasoning, the model may still be matching against patterns and linguistic cues it saw during training.

Third, the models had access to unlimited computational power. Think $1.5Mn worth of compute power to solve 5 math problems.

Last, this is math. Like chess, it’s nice and structured. Other environments, like public policy, medical reasoning, or organizational strategy, are messy, ambiguous, and not benchmarkable. 

Mores articles on Revenue