AI – next steps – BEYOND THE TRANSFORMER
A gentle technical introduction to future AI developments
Large Language Model’s Scaling Limits
And the New Frontiers of AI Research
[ Disclaimer: This review has been researched by Claude, the AI from Anthropic, with mayor editing by me to reduce length and improve readability. ]
Most people have heard of ChatGPT, Claude, or Gemini.
What most people don’t know is that all of them run on the same underlying architecture — the Transformer — developed in 2017.
And that architecture, extraordinary as it has been, is hitting real limits.
This is not a crisis. It is a transition. And what comes next is genuinely interesting.
THE LIMITS OF UPSCALING
For six years, the AI industry ran on one simple idea: bigger = smarter.
Build larger models. Feed them more data. Spend more money. It worked spectacularly — for a while.
The problem is that the returns are shrinking.
You can see it in the costs:
2017 — training the first Transformer cost roughly $900
2020 — GPT-3 cost up to $4.6 million
2023 — GPT-4 cost over $100 million
2026 — frontier models cost $100 million to $1 billion
End of decade — projected at $10 to $100 billion per training run
The performance improvements per dollar spent are getting smaller.
The models are also running out of good material to learn from — they have now consumed a large fraction of all useful human-generated text on the internet.
The internet, it turns out, is finite.
A wake-up call came from China: DeepSeek trained a model competitive with the best American AI for roughly $5.6 million — a fraction of what Western labs spend — by being smarter about efficiency.
The message landed clearly. The path forward is not through bigger models. It is through better ones.
WHAT’S COMING NEXT
Several new approaches are gaining serious momentum. None has definitively “won” yet — but each addresses something current AI does poorly.
STATE SPACE MODELS — Smarter Memory
The problem: Today’s AI reads a text by comparing every word to every other word. Double the length of the text — quadruple the computing cost. For long documents, videos, or real-time data streams, this becomes prohibitively expensive.
The solution: State Space Models maintain a running compressed “memory” that updates as new input arrives — at a fraction of the cost. Think of it as the difference between reading every page of a book simultaneously versus reading page by page and remembering what matters.
The leading variant — called Mamba — is already being integrated into systems deployed by major companies including NVIDIA and Tencent.
This is no longer theory. It is entering production.
Best suited for: Long documents, video analysis, real-time data, scientific simulation.
NEUROSYMBOLIC AI — Adding a Reasoning Engine
The problem: AI hallucinates. It produces confident, fluent nonsense. It cannot reliably check its own work or explain why it reached a conclusion — which matters enormously in medicine, law, and finance.
The solution: Pair the AI’s pattern recognition with a formal logic system that verifies conclusions, checks consistency, and explains its reasoning. The neural network handles perception and language. The symbolic engine handles verification.
Already deployed: Amazon uses this in warehouse robots and its shopping assistant to reduce errors.
DeepMind’s AlphaGeometry solved competition mathematics at gold-medal level using the same principle — neural intuition combined with a logic engine that verifies each step.
Best suited for: Any high-stakes domain where “probably right” is not good enough.
WORLD MODELS & JEPA — Understanding, Not Just Predicting
This is the most philosophically radical direction — and the one attracting the most serious investment.
The critique: Yann LeCun — one of the founders of modern AI, Turing Award winner — argues that training AI to predict words and pixels is fundamentally the wrong approach.
The world is mostly unpredictable at that level of detail. Trying to predict everything produces systems that generate plausible-sounding nonsense, because they have no actual model of how the world works.
The alternative: Instead of predicting what the world will look like, learn to predict what it will mean — how situations evolve, what the consequences of different actions will be. Build a model of reality, not a model of text.
This is called JEPA — Joint Embedding Predictive Architecture. A JEPA-based system can simulate the consequences of its actions before acting.
That is planning — something current AI fundamentally cannot do.
An early result: Meta’s JEPA robot system learned from 62 hours of observation video. Placed in a completely new environment it had never seen, it planned and successfully moved a cup — with 80% accuracy. The previous best robot system achieved 15% on the same task.
The money following the idea:
Yann LeCun left Meta and raised $1 billion for his new company
Fei-Fei Li’s World Labs raised $1 billion building spatial world models
Wayve (autonomous driving) raised $1.2 billion
When serious investors make billion-dollar bets, it is worth paying attention.
A FEW OTHER DIRECTIONS WORTH KNOWING
Thinking Models:
Rather than making the model bigger, these invest more computing time during the conversation, reasoning step by step before answering. OpenAI’s o-series and Anthropic’s extended thinking work this way. A meaningful improvement for complex problems.
Neuromorphic Chips
Hardware designed to mimic biological neurons: energy-efficient, capable of learning from small amounts of data. Still early-stage, but compelling in a world where energy consumption is becoming AI’s primary bottleneck.
THE OVERALL PICTURE of AI’s NEXT FUTURE
Large language models — the technology behind today’s chatbots — are not going away. They are maturing into infrastructure, like databases or search engines: essential, widely used, steadily improving, but no longer the frontier.
The frontier is moving toward systems that can reason more reliably, act more efficiently, and actually model how the world works — rather than how text about the world reads.
No single successor has emerged yet. The next generation of AI will likely combine several of these approaches. The researchers and investors who take these directions seriously are not the dreamers. They are the people who built what we already have.
And underneath all of it, a question that no benchmark can answer:
Is language actually the right foundation for intelligence?
Current AI mainstream companies says yes — or at least, that it is enough.
The people building world models say no.
The answer to that question will shape the next decade of AI more than any single product launch or company valuation.
Based on research from Stanford HAI AI Index 2026, Communications of the ACM, ICLR 2026 proceedings, Meta AI Research, and multiple academic and industry sources — May 2026.
