Brilliant Until It Isn't

Author:
Mike Merry
Published:
January 27, 2026
Reading Time:
6 minutes

It is a truth, mostly unacknowledged, that an LLM of good fortune cannot extrapolate beyond its training dataset.

This is a pretty severe limitation, and has the implication that LLMs themselves cannot push the boundary of human knowledge. But I think we also systematically underestimate how hard truly novel thought is, rather than seeing that much of what is new is really filling in the details in existing knowledge rather than pushing the bounds further.

I’ve gone all in on AI. I have local LLMs running, and they, plus Claude support my butler, Jeeves, who has his own private butlery in Notion where he can maintain his own notes. I run multiple Claude Code windows in parallel, whilst also using Cursor in both agentic and auto-complete modes.

The pattern is consistent - if I’m effectively retracing other people’s steps, but applied to my own work, it’s like having a sold rocket motor connected. Push the button and hold on, because you’re flying, and just be careful to stay in control. But as soon as you go off the map there’s a wall that you crash into, and it’s like the laws of physics just changed underneath you.

It’s brilliant…

I’ve been rolling a couple of ISO27001 implementations this year. I experimented with one for myself, which was to establish a right-sized framework for a two-person startup with a moderately-sensitive product. With Claude Code, I was able to have a full, beautifully written and simple framework, up to date with latest standards, with applicability to my own stack, with my own preferences. I was able to write and review the policies within a morning, and they’re great - solid but pragmatic best practice.

When building out software, the more I’m working on problems that are very well trodden - especially productivity tools like to-do apps, calendar apps and the like - the more I can just point and shoot and it just works. But I’ve also been building a life-cycle-assessment product which has far fewer in-production examples and only one open-source project out there. The tools need a lot of hand holding when working in this domain.

… Until it isn’t.

End of last year, I hit the wall hard. I’m working on a way to train explainable neural networks, and I’ve got a moderately crazy idea of how to do it. I’m bringing together a few ideas from disparate branches of mathematics, and trying to overlay scientific-method principles on top. In short, I don’t think this is represented in the training set.

And it shows. Claude, Gemini, Cursor, ChatGPT all fell flat trying to put the ideas into code. There’s nothing in the training set to support it. And I was still figuring out my own thoughts on this. I had more or less stalled for 6 months at the same point.

In November, a couple of calls ended up resolving a couple of major blockers in my thinking. I suddenly had the perspective change and the piece of the puzzle to make it all work. And in record time I was able to pull all of the thoughts together, formalise it and put it into a paper. (It is now on arXiv if you want to read, link below, but beware it’s pretty dense).

Once the paper was written, I dropped it straight into Claude Code, added a bit of extra instruction, and later that day I had the working first version of what I was building, starting again from scratch.

In short, I wrote an entire article in order to specify a new build.

Something old, something new…

The article is certainly not something brand new. Rather, like so much of academia, it’s >90% existing work with maybe one new insight that pulls them together in a new way. In this case, I had existing work from graph theory, automatic differentiation, explainable AI, and epistemology, combined with two observations - that explanations are contextual to their audience, and that combining small explanations requires its own form of explanation.

I was able to author the two observations, and direct how all the component parts then came together. But Claude was exceptionally good at then doing all the detailed working to formalise it and ensure that it was presented consistently.

What differentiates your work?

There is a wonderful framework that describes generic, supporting and core domains. Your core domain is both hard to replicate, and is different from your competitors. This also aligns with business strategy - it is what Helmer describes as Power, Bezos describes as a Moat, etc.

AI has made replicating what other people do much easier. This is now commoditised, and in a race to the bottom on price, the AI will almost certainly win.

Understanding the part that makes you different, where it’s not represented in the training set, where it’s not just executing better, but is actually a different, better product - that is at the heart of business strategy right now. Your customers won’t choose you over your competitor because your accounts are tidier. You need to provide true, differentiated value.

Sharpen your edge, let AI tackle the rest.

P.S.: For those feeling brave - https://arxiv.org/pdf/2512.17316

We find the boundaries.
You sharpen your edge.