Many have caught onto the truth that AGI-through-pure-LLM-scaling is probably not going to happen. And many have identified continual learning as the key difference between LLMs and a generally intelligent agent. If you’ve ever used Claude Code, you will be acutely aware of how effective context length limits LLMs’ general utility, and if only we had something that wouldn’t run out of context, we could all finally be unemployed.
A tempting (and sensible) attempt to solve this is continual midtraining. For example, this would be Anthropic collecting successful Claude Code traces and folding it back into the SFT stage for its next model, which they would release on a monthly basis. This might make them very powerful coding agents, but will not give them the ability to fully automate jobs. Why? Because all this procedure does is continually improve the LLM’s world model, which is distinct from its world state. Its world state only exists within its position embedded KV cache.
Humans have both forever-evolving world models and world state. We efficiently compress our experiences into improving our baseline world model over our lifetime, while still being able to retrieve memories in order at resolutions weighted by their importance.
This problem of compressing experience into a world model (or a value function) over an infinite horizon is what needs to be solved. Continual midtraining is only a bandaid.