"Our current (April 2026) object usage is: 14 million objects, 119GB"
I mean, I appreciate the openness about the scale, but for context, my home's personal backup managed via restic to S3 is 370GB. Fewer objects, but still, we're not talking a big install here.
This is pretty much like that story of, if it fits on your laptop, it's not big data.
My phone has 327.1G of storage in use right now :-) (years ago a friend of mine who was also old enough to complain about "640k" suggested that we should just denominate storage in dollars, so "reasonable" amounts scaled with Moore's law...)
I remember when people would vigorously complain that Toolkit X was simply unsuitable for any task because it did not conform to the operating system's standard visual appearance.
Now I struggle to even define what an "operating system's standard visual appearance" is. Apple's still the best but not what they used to be on that front even so.
I'll still die on this hill, but I think that the reason there's a computer literacy problem is because we moved away from following OS conventions (when they existed) and into bespoke, branded UIs for everything, and then eventually to web where every site and webapp behaves differently.
In the early days, if you learned the OS, those usage patterns and skilled transferred to every app on that OS. They all looked roughly the same, shared the same menus, shame shortcuts, same icons, etc. You didn't have to learn how to use Apps x, y, and z. You just had to learn Windows (to an extent).
Then marketing got involved, and then the web, and then suddenly every piece of software had to stand out and look and behave as unique as possible, throwing years of HIG research out the window.
Notice that several examples in the Claude Design demo video are typing in English things that could be accomplished through UI controls, if the user only knew where to find them.
Not all OS's, unfortunately. I'm on the boat that says conforming to Gnome HIG's is a bad idea.
Just today I had the disk usage analyzer (baobab) open and I was navigating inside directories so I want to go up a directory and clicked on the "<-" left arrow in the headerbar, which went "back" a screen, discarding all the work done scanning the filesystem.
If this app had a traditional menubar and a toolbar this wouldn't have happened.
This is a common type of experience I have every time I use a Gnome app. It almost feels like someone deliberately researched how to make desktop apps as counter-intuitive as possible and implemented that as the policy for some reason.
I miss the days when there was no "standard visual appearance" for the OS (e.g. DOS). I liked the diversity of interfaces.
Years ago, I remarked to a friend that I'd spent half of my (computing) life post-high speed Internet, yet almost all my happy memories are from before that. It was the same for him, and we both explored why that was.
The homogeneity of interfaces was actually one of the reasons we came up with on why doing work at a computer is a lot less appealing.
That may be true, and had you asked me half a lifetime ago, I would have likely said "The old days were better".
But:
I would have still said I enjoyed using computers. And I wouldn't have said "Today's interface sucks" (well, other than my HW not being able to keep up with eye candy...)
I simply don't enjoy using the computer these days. And I do think the interface sucks. Pretty much anything that involves using the web browser sucks - be it a local app or a web app.
I don't remember people complaining about Winamp being a non-standard UI, but if it were slow then there'd be tons of complaints - and many of the "fancy" UIs were terribly slow (or the programs were, hard for a user to tell the difference).
Acknowledging that we still only have marketing material, it is their claims on Mythos' ability to auto-generate working exploits that is what actually changes the cost/benefit tradeoffs. Their own Mythos docs showed that it is only a marginal improvement over current models in generation hypotheses about exploits, the difference was finding the exploits automatically (and correctly).
I kind of confirmed this against some of my own code bases. I pointed Opus 4.6 against some internal code bases. It came up with a list of possibilities. The quality of the possibilities was quite mixed and the exploit code generally worthless. So I did at least do a spot check on that aspect of their marketing and it checked out.
The problem is that this changes the attacker versus defender calculus. Right now, the world is basically a big pile of swiss cheese, but we are not all being continuously popped all the time for full access to everything because the exploitation is fundamentally blocked on human attackers analyzing the output of tools, validating the exploits, and then deciding whether or not to use them.
That "whether or not to use them" calculus is also profoundly affected by the fact that they can generally model the exploits they've taken to completion as being fairly likely to uniquely belong to them and not be fixed by the target software, so they have the capability to sit on them because they are not rotting terribly quickly. It is well known that intelligence agencies, when deciding whether or not to attack something, also consider the impact of the possibility of leaking the mechanism they used to attack the user and possibly losing it for future attacks as a result. A particularly well-documented discussion of this in a historical context can be found around how the Allies used the fact they had broken Enigma, but had to be careful exactly how they used the information they obtained that way, lest the Axis work out what the problem was and fix it. All that calculus is still in play today.
The fundamental problem with the claims Mythos made isn't that it can find things that may be vulnerabilities; the fundamental sea change they are claiming is a hugely increased effectiveness in generating the exploits. There's a world of difference in the cost/benefits calculus for attackers and defenders between getting a cheap list of things humans can consider, which was only a quantitative change over the world we've lived in up to this point, and the humans being handed a list of verified (and likely pre-weaponized with just a bit more prompting) vulnerabilities, where the humans at most have to just test it a bit in the lab before putting it in the toolbelt. That is a qualitative change in the attacker's capabilities.
There is also the second-order effect that if everybody can do this, the attackers will stop assuming that they can sit on exploits until a particularly juicy target worth the risk of burning the exploit comes up. That get shifted on two fronts: Exploits are cheaper, so there's less need to worry about burning a particular one, and in a world where everyone has Mythos, everyone is scanning everything all the time with this more sophisticated exploiting firepower and just as likely to find the exploit as the nation-state attackers are, so the attackers need to calculate that they need to use the exploits now, even if it's a lower value attack, because there may not be a later.
If, if, if, if, if the marketing is even half true, this really is a big deal, but it's because of the automated exploit generation that is the sea change, not just finding the vulnerabilities. And especially not finding the same vulnerabilities as Mythos but also including it in a list of many other vulnerabilities that are either not real or not practically exploitable that then bottlenecks on human attention to filter through them. Matching Mythos, or at least Mythos' marketing, means you pushed a button (i.e., simple prompt, not knowing in advance what the vuln is, just feeding it a mass of data) and got exploit. Push button, get big unfiltered list of possible vulnerabilities is not the same. Push button, get correct vulnerability is closer, but still not the same. The problem here is specifically "push button, get exploit".
A dollar is still a useful unit as "the fraction of the economy that can be controlled by currency". It's true that printing a huge pile of it and throwing it at GPUs wouldn't instantly convert into more GPUs, but it would meaningfully represent that other things are being squeezed out to allocate more resources to GPU production even so. That such reallocation is inefficient, arguably immoral, and highly questionable in the long term versus other options wouldn't stop that from being ture.
Reducing the amount of time I spend on the average code has meant I'm spending more time adding my above-average contributions to the code base. Amdahl's law, basically. Reducing the amount of time spent on one task means the percentage of time spent on the others increases.
How stable that is on the long term, I don't know any more than the next guy, but it is where I'm contributing now.
Code isn't going anywhere. Code is multiple orders of magnitude cheaper and faster than an LLM for the same task, and that gap is likely to widen rather than contract because the bigger the AI gets the sillier it gets to use it to do something code could have done.
Compare the actual operations done for code to add 10 8-digit numbers to an LLM on the same task. Heck, I'll even say, forget the possibility the LLM may be wrong. Just compare the computational resources deployed. How many FLOPS for the code-based addition? How many for the LLM? That's a worst-case scenario in some ways but it also gives you a good sense of what is going on.
Humans may stop looking at it but it's not going anywhere.
I think grandparent comments were talking about how Codex designers try to push LLMs to displace the interface to code, not necessarily code itself. In that view, code could stay as the execution substrate, but the default human interaction layer moves upward, the way higher-level languages displaced direct interaction with lower-level ones. From a HCI perspective, raw computational efficiency is not the main question; the bottleneck is often the human, so the interface only has to be fast and reliable enough at human timescales.
AI being what it is, at this point you might be able to ask it for a token to put in a web page at .well-known, put it in as requested, and let it see it, and that might actually just work without it being officially built in.
I suggest that because I know for sure the models can hit the web; I don't know about their ability to do DNS TXT records as I've never tried. If they can then that might also just work, right now.
A smart AI would realise that I can MITM its web access such that sees the .well-known token that isn't actually there. I assume that the model doesn't have CA certificates embedded into it, and relies on its harness for that.
In this context we are talking explicitly about cloud-hosted AIs. If you control it locally you have a lot of options to force it to do things.
MITM the cloud AI on the modern internet is non-trivial, and probably harder and less reliable than just talking your way around the guardrails anyhow.
> In this context we are talking explicitly about cloud-hosted AIs.
Looking upthread, we seem to be talking about Claude. Claude is cloud-hosted inference but the harness is local if you're using Claude Code, and can be MITM'd there.
I think even Claude Web can run arbitrary Linux commands at this point.
I tried using it to answer some questions about a book, but the indexer broke. It figured out what file type the RAG database was and grepped it for me.
Another cross-check I've run is, are the claims Anthropic is making for Mythos that out of line with the current status of AI coding assistents?
To which my answer is clearly, no, not even remotely. If Anthropic is outright lying about what Mythos can do, someone else will have it in a year.
In fact the security world would have to seriously consider the possibility that even if Mythos didn't exist that nation states have the equivalent in hand already. And of course, if Mythos does exist, nation states have it now. The odds that Antropic (and every other AI vendor) isn't penetrated enough by every major intelligence agency such that they have access to their choice of model approach zero.
I wonder about the overlap between people being skeptical of Mythos' capabilities, and those who are too skeptical of AI to have spent any time with it because they assume it can't be any good. If you are not aware of what frontier models routinely do, you may not realize that Mythos is just an evolution of existing capabilities, not a revolution. Even just taking a publicly-available frontier model, pointing it at a code base and telling it to "find the vulnerabilities and write exploits" produces disturbingly good results. I can see the weaknesses referenced by the Mythos numbers, especially around the actual writing of the exploits, but it's not like the current frontier models fall on their face and hallucinate wildly for this task. Most everything they produce when I try this is at least a "yeah, that's worth thinking about" rather than an instant dismissal.
I've said for decades that, in principle, cybersecurity is advantage defender. The defender has to leave a hole. The attackers have to find it. We just live in a world with so many holes that dedicated attackers rarely end up bottlenecked on finding holes, so in practice it ends up advantage attacker.
There is at least a possibility that a code base can be secured by a (practically) finite number of tokens until there is no more holes in it, for reasonable amounts of money.
This also reminds me of what I wrote here: https://jerf.org/iri/post/2026/what_value_code_in_ai_era/ There's still value in code tested by the real world, and in an era of "free code" that may become even more true than it is now, rather than the initially-intuitive less valuable. There is no amount of testing you can do that will be equivalent to being in the real world, AI-empowered attackers and all.
>in principle, cybersecurity is advantage defender
I disagree.
The defender must be right every single time. The attacker only has to get lucky and thanks to scale they can do that every day all day in most large organizations.
My understanding of defense in depth is that it is a hedge against this. By using multiple uncorrelated layers (e.g. the security guard shouldn’t get sleepier when the bank vault is unlocked) you are transforming a problem of “the defender has to get it right every time” into “the attacker has to get through each of the layers at the same time”.
It is a hedge, that said it only reduces the probability of an event and does not eliminate it.
To use your example, if the odds of the guard being asleep and the vault being unlocked are both 1% we have a 0.0001 chance on any given day. Phew, we're safe...
Except that Google says there are 68,632 bank branch locations in the US alone. That means it will happen roughly 7 times on any given day someplace in America!
Now apply that to the scale of the internet. The attackers can rattle the locks in every single bank in an afternoon for almost zero cost.
The poorly defended ones have something close to 100% odds of being breached, and the well defended ones how low odds on any given day, but over a long enough timeline it becomes inevitable.
To again use your bank example. if we only have one bank, but keep those odds it means that over about 191 years the event will happen 7 times. Or to restate that number, it is like to happen at least once every 27 years. You'll have about 25% odds of it happening in any 7 year span.
For any individual target, it becomes unlikely, but also still inevitable.
From an attackers perspective this means the game is rigged in their favor. They have many billions of potential targets, and the cost of an attack is close to zero.
From a defenders perspective it means realizing that even with defense in depth the breach is still going to happen eventually and that the bigger the company is the more likely it is.
Cyber is about mitigating risk, not eliminating it.
Well, the attacker has something to lose too. It's not like the defender has to be perfect or else attacks will just happen, it takes time/money to invest in attacking.
The cost to your average ransomware crew can be rounded down to zero, because it's pretty darn close. They use automated tools running on other peoples computers and utilizing other peoples connectivity. The tools themselves for most RaaS (ransomware as a service) affiliates are also close to zero cost, as they pay the operator a percentage of profits.
The time is a cost, but at scale any individual target is a pretty minor investment since it's 90%+ automated. Also, these aren't folks that are otherwise highly employable. The opportunity cost to them is also usually very low.
The last attacker I got into a conversation with was interesting. Turns out, he was a 16 year old from Atlanta GA using a toolkit as an affiliate. He claimed he made ~100k/year and used the money on cars and girls. I felt like he was inflating that number to brag. His alternative probably would have been McDonalds, and as a minor if he got caught it would've been probation most likely. I told him to come to the blue team, we pay better.
At the end of the day, that guy is spending all of his finite hacking time setting up and maintaining these exploits and stolen infra. His marginal cost of breaching you is 0 if you're already vulnerable to the exact same exploit he already set up, but that's a big if, and someone else spent their finite time making toolkits. Otherwise you'd expect everything on the Internet that has any kind of vuln to be breached already.
Anyway I'm curious about the 16yo. Is it that he has special skills, or is it just that minors will do that dirty work for cheaper, given lower consequences and fewer other opportunities?
> m curious about the 16yo. Is it that he has special skills, or is it just that minors will do that dirty work for cheaper, given lower consequences and fewer other opportunities?
I was only able to keep him talking for about 20 minutes, so I can only speculate, but he was using off the shelf RaaS tools that he had modified to make more convincing. I actually got him talking by pointing out that a trick he'd done with the spoofed email headers from "coinbase" was clever, so he was definitely skilled for someone so young. He also had done his homework and knew a bit about me.
It's likely he was recruited just because he was too young for prison, but that he was relatively successful because he was clever.
All you have to do is prompt your AI with a writing sample. I generally give it something I wrote from my blog. It still doesn't write like I do and it seems to take more than that to get rid of the emdashes, but it at least kicks it out of "default LLM" and is generally an improvement.
I mean, I appreciate the openness about the scale, but for context, my home's personal backup managed via restic to S3 is 370GB. Fewer objects, but still, we're not talking a big install here.
This is pretty much like that story of, if it fits on your laptop, it's not big data.
reply