My bet: they use formal methods (like an interpreter running code to validate, o...

andix · on Sept 13, 2024

My understanding was from the beginning that it’s an agent approach (a self prompting feedback loop).

They might’ve tuned the model to perform better with an agent workload than their regular chat model.

JasonSage · on Sept 13, 2024

I think it could be some of both. By giving access to the chain of thought one would able to see what the agent is correcting/adjusting for, allowing you to compile a library of vectors the agent is aware of and gaps which could be exploitable. Why expose the fact that you’re working to correct for a certain political bias and not another?