On a related note - would it be easier, instead of doing a benchmark sweep across the whole NxN set of start-end pairs for which layers to modify, to instead measure cross-correlation between outputs of all layers? Shouldn't that produce similar results?
I think that "hovering the code" and "broken rhythm" are correlated, but still separate issues. There are parts of my code (e.g. vibe-coded mockups) that I don't look at the code of at all, and I don't care about it. There are others where I check everything in detail (same as if I were to review someones PR), and I think I have a very good grasp of that code.
But the broken rhythm problem persists regardless, and I find that issue to become more and more serious as LLMs are able to work for longer and longer on their own.
It might be that what we're experiencing now is just an uncanny valley, where they're not yet good enough for us managing them to work in similar ways as with other developers, but are good enough to allow us to switch our attention away from them while they work. But that attention span is mostly wasted, as the time between interactions isn't enough to e.g. work on something else, or read a book.
It's a stupid analogy, but currently it's similar to having a bathroom break every couple of minutes, and if this continues, most developers will probably start doomscrolling more and more.
I was wondering recently if there are some productive activities that might fit well into this rhythm, but I haven't found any yet. I guess sourdough baking is one such example, but there's only so much bread you can eat...
There's a whole industry of "illusions" humans fail for: optical, word plays (including large parts of comedy), the Penn & Teller type, etc. Yet no one claims these are indicators that humans lack some critical capability.
Surface of "illusions" for LLMs is very different from our own, and it's very jagged: change a few words in the above prompt and you get very different results. Note that human illusions are very jagged too, especially in the optical and auditory domains.
No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless. It's all about how we organize the workflows around these limitations.
This is very "distant" suggestion if you enjoyed Antimemetics, but The Unconsoled by Kazuo Ishiguro is another one of my favourites, and it too explores this idea of unreliable and inconsistent memories, although from a completely different angle.
This makes no sense to me. There are plenty of artists out there (e.g. El Anatsui), not to mention whole professions such as architects, who do not interact directly with what they are building, and yet can have profound relationship with the final product.
Discovering the right problem to solve is not necessarily coupled to being "hands on" with the "materials you're shaping".
In my company, [enterprise IT] architects are separated into two kinds.
People with a CV longer than my arm who know/anticipate everything that could fail and have reached a level of understandind that I personnally call "wisdom".
And theorists, who read books and norms, who focus mostly on the nominal case, and have no idea [and no interest] in how the real world will be a hard brick wall that challenges each and every idea you invent.
Not being hands-on, and more important not LISTENING to the hands-on people and learning from them, is a massive issue in my surroundings.
So thinking hard on something is cool. But making it real is a whole different story.
I don't see why his involvement, explaining to his team how exactly to build a piece, is any different from a developer explaining to an LLM how to build a certain feature, when it comes to the level of "being hands on".
Obviously I am not comparing his final product with my code, I am simply pointing out how this metaphor is flawed. Having "workers" shape the material according to your plans does not reduce your agency.
> I don't see why his involvement, explaining to his team how exactly to build a piece, is any different from a developer explaining to an LLM
Because everyone under him knows that a mistake big enough is a quick way to unemployment or legal actions. So the whole team is pretty much aligned. A developer using an LLM may as well try to herd cats.
First, that's quite a sad view of incentives structures. Second, you can't be serious in thinking that "worker worried they might be fired" puts the person in charge closer to the "materials" and more "hands on" with the project.
I always send myself messages as bookmarks, notrs, and general "here's an idea I might want to come back to later". It's an awful system when it comes to discoverability (I do it in Messenger so search is... bad), but I still do it as it's the most convinient option at the moment when I need to store that something somewhere.
So I built a simple Telegram bot which automatically stores anything I send as a text embedding into a vector database, and allows me to search over it in that same chat (same process that powers the AI Q&A assistants these days).
If I post a link, it automatically scrapes it and stores text as chunks for better search, extracts text from youtube videos (still wip), turns images into text with the visual models, etc.
One thing I'm unhappy about is not being able to easily edit any notes I search for later, but it's miles ahead of my previous "system". Hopefully I can open source this when I clean it up - if anyone is interested, let me know.
How do you look at hiring "experienced people" vs. "enthusiastic interns" on something like this? More generally, how quickly do you think the team will grow, and what the ratio should be between the "old" and the "young"?
Very hard to guess how it might all shake out. I would say that both Jeremy and I have an almost fanatical belief in the power of uncredentialed outsiders. So I would guess we will be more looking for curious open-minded generalists more than any specific age or experience level. I do expect we will grow headcount rather slowly, but that doesn’t mean we will launch infrequently