A figma like dashboard for turning ClaudeCode, Gemini Cli, Codex into an OpenClaw but with security measures to break the lethal trifecta while running on a VM.
But it's not quite there in terms of usability. I agree that is the hardest part of the equation. It's something I'm constantly experimenting with and haven't found the solution to it yet. Open to feedback!
I don't think that is entirely fair.. I don't see them stating anywhere they are measuring coding capabilities... "Using complex games to probe real intelligence."
And this seems very much in line with the methodology in ARC-AGI-3.
The results here, in the OP article and in https://www.designarena.ai all tell a similar story: Kimi K2.6 is up and in the SOTA mix.
The task was writing a "bot" to play the game. The title is "Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge." How does that not imply measuring coding capabilities?
You could have the actual output of the agent turned into TTS using the model of your choice with TalkiTo… or listen to whatever weird sounds this makes. Seems like this is copying that viral Mac moan app. 2026 is weird.
"Let's be honest here: there is no benefit to alcohol (for example wine) and is only detrimental." - That is a pretty extreme statement and easily falsifiable.
There are many studies a quick google away that show a much more nuanced take ie [0] and [1]. But the strongest evidence is our most successful societies and civilizations have been intentionally drinking alcohol for ~10000 years [2]. If it was only detrimental then I'm pretty sure it would have worked its way out by now. I acknowledge there are negative issues.
OrcaBot does this with the VM but whereas the author mentions the risk of GitHub keys being leaked, OrcaBot uses a key broker to ensure the LLM doesn’t have access to any keys. It even works on the API keys to the LLMs themselves.
https://orcabot.com/blog#breaking-the-lethal-trifecta
Really cool. A tangential task that seems to be coming up more and more is masking sensitive data in these calls for security and privacy. Is that something you considered as a feature?
The SQLite database is ephemeral — stored in the OS temp directory (/tmp/context-mode-{pid}.db) and scoped to the session process. Nothing persists after the session ends. For sensitive data masking specifically: right now the raw data never leaves the sandbox (it stays in the subprocess or the temp SQLite store), and only stdout summaries enter the conversation. But a dedicated redaction layer (regex-based PII stripping before indexing) is an interesting idea worth exploring. Would be a clean addition to the execute pipeline.
Yes — the database is tied to the MCP server process, so it's created fresh on each claude launch and lost when you exit; resuming a session starts a new process with a new empty database.
“so the API use is in addition to the subscription, but it can't be helped.” - I beg to differ. OrcaBot.com is a claws that runs using vanilla Claude Code so you can do all that with your regular subscription. Disclosure: I’m the author. The only reason these other claws can’t offer that is because they front it with their own AI.
That's pretty cool. And when I first tried this, I tried to do it with a bash loop around `claude -p` and you can get quite far with that! But overall, I think I'd rather use their tools the way they've set them up to be used and pay them their $500/month total or whatever. I'm probably going to stick to this approach, but your thing is pretty neat so thank you for sharing.
Just a heads up, I tried to use the continue with google button on your site, but running into "Bot verification failed". Using stock chrome browser, not running a VPN either
Thanks for mentioning that. The bot filter has been causing trouble so I def need to go and look at it. Debated disabling it but any basic bot that starts a dashboard is spinning up a VM I pay for! Changing browser might be a workaround?
This is amazing. I agree with your take except "You’re not actually zeroizing the secrets"... I think it is actually calling zeroize() explicitly after use.
Can I get your review/roast on my approach with OrcaBot.com? DM me if I can incentivize you.. Code is available:
enveil = encrypt-at-rest, decrypt-into-env-vars and hope the process doesn't look.
Orcabot = secrets never enter the LLM's process at all. The broker is a separate process that acts as a credential-injecting reverse proxy. The LLM's SDK thinks it's talking to localhost (the broker adds the real auth header and forwards to the real API). The secret crosses a process boundary that the LLM cannot reach.
I think we're both right about zeroize. Added a reply to clarify. In short, yes, the key and password are getting zeroized, but not the actual secrets. Which seems like the thing that matters in this context, at least given the tool's stated aims.
OrcaBot: There's a lot there! Ambitious project. Cute name, who doesn't love orcas? I don't see anything screamingly bad, of the variety that would inspire me to write essays about random people's code.
Some thoughts: The line between dev mode and production is a bit thin and lightly enforced. Given the overall security approach, you could firm that up. The within-VM shared workspace undermines the isolated PTYs. If your rate-limiting middleware fails, you allow all requests through. `SECRETS_ENCRYPTION_KEY` is the one ring and it doesn't have any versioning or rotation mechanisms.
In general it seems like a good approach! But there are spots where one thing being misconfigured could blow the entire system open. I suggest taking a pass through it with that in mind. Good luck.
A figma like dashboard for turning ClaudeCode, Gemini Cli, Codex into an OpenClaw but with security measures to break the lethal trifecta while running on a VM.
But it's not quite there in terms of usability. I agree that is the hardest part of the equation. It's something I'm constantly experimenting with and haven't found the solution to it yet. Open to feedback!
reply