Not really, because the LLM loop doesn't have the ability to get updates from th...

jameshart · 2026-04-05T01:21:46 1775352106

LLMs can have whatever abilities we build for them. The fact we currently start their context out with a static prompt which we keep feeding in on every iteration of the token prediction loop is a choice. We don’t have to keep doing that if there are other options available.

foota · 2026-04-10T20:01:16 1775851276

Late to reply, but.. yes, hence my reply that it would need to be integrated all the way down the stack.

But also, LLMs (or their current implementation) rely heavily on print caching for efficiency, without this costs are much higher. You can do neat tricks with it, but generally you're limited to playing with the end of the context to avoid breaking things.

I think some agents do add small context snippets to the end of the conversation that get used by the agent. You can do things like: conversation messages + context snippets + new message and then once the agent replies make the next turn conversation + new message + reply + ... This breaks the cache only for the latest message (not too bad) and let's you give the model current up to date information. This is how stuff like the "mode" or "what time is it now" are handled I believe.