A lot of comments here are dismissing this post because the relevant code was isolated. But thats the exact same thing Anthropic did with Mythos! They describe their (very lean) harness in the Anthropic Red Mythos blog post. The harness first assigns each file in the given codebase an importance value. Then points claude code at the cpdebase with a prompt stating that it should focus on that file. It spawns a claude code instances for each file in the codebase.
So no, the fact that the posters isolated the relevant code does not invalidate their findings.
I mean you can still scale that? Ask a lighter model to go through every function to find vulnerabilities, take output to bigger model like Opus and classify the critical ones.
So no, the fact that the posters isolated the relevant code does not invalidate their findings.
[1] https://red.anthropic.com/2026/mythos-preview/