More

ibestvina · 2026-03-23T22:08:53 1774303733

On a related note - would it be easier, instead of doing a benchmark sweep across the whole NxN set of start-end pairs for which layers to modify, to instead measure cross-correlation between outputs of all layers? Shouldn't that produce similar results?

ibestvina · 2026-03-12T21:08:06 1773349686

I think that "hovering the code" and "broken rhythm" are correlated, but still separate issues. There are parts of my code (e.g. vibe-coded mockups) that I don't look at the code of at all, and I don't care about it. There are others where I check everything in detail (same as if I were to review someones PR), and I think I have a very good grasp of that code.

But the broken rhythm problem persists regardless, and I find that issue to become more and more serious as LLMs are able to work for longer and longer on their own.

It might be that what we're experiencing now is just an uncanny valley, where they're not yet good enough for us managing them to work in similar ways as with other developers, but are good enough to allow us to switch our attention away from them while they work. But that attention span is mostly wasted, as the time between interactions isn't enough to e.g. work on something else, or read a book.

It's a stupid analogy, but currently it's similar to having a bathroom break every couple of minutes, and if this continues, most developers will probably start doomscrolling more and more.

I was wondering recently if there are some productive activities that might fit well into this rhythm, but I haven't found any yet. I guess sourdough baking is one such example, but there's only so much bread you can eat...

pauletienney · 2026-03-12T21:17:40 1773350260

We might be in a kind of uncanny valley. Models may become good and "independant" enough to compare to a colleague.

I feel the solution is either reduce drastically time between each interaction with the agent OR increase it by a lot (every 2 or 3 hours).

Maybe we do not have the right workflow yet. Maybe the work with an agent should be more async.

I guess we will figure out.

ibestvina · 2026-02-16T10:22:55 1771237375

There's a whole industry of "illusions" humans fail for: optical, word plays (including large parts of comedy), the Penn & Teller type, etc. Yet no one claims these are indicators that humans lack some critical capability.

Surface of "illusions" for LLMs is very different from our own, and it's very jagged: change a few words in the above prompt and you get very different results. Note that human illusions are very jagged too, especially in the optical and auditory domains.

No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless. It's all about how we organize the workflows around these limitations.

raincole · 2026-02-16T10:26:56 1771237616

> No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless.

I was about to argue that human illusions are fine because humans will learn the mistakes after being corrected.

But then I remember what online discussions over Monty Hall problem look like...

ibestvina · 2026-02-16T10:34:15 1771238055

Exactly! I now feel bad for not thinking of that example, thank you.

ibestvina · 2026-02-13T17:41:28 1771004488

This is very "distant" suggestion if you enjoyed Antimemetics, but The Unconsoled by Kazuo Ishiguro is another one of my favourites, and it too explores this idea of unreliable and inconsistent memories, although from a completely different angle.

ibestvina · 2026-02-04T09:21:38 1770196898

This makes no sense to me. There are plenty of artists out there (e.g. El Anatsui), not to mention whole professions such as architects, who do not interact directly with what they are building, and yet can have profound relationship with the final product.

Discovering the right problem to solve is not necessarily coupled to being "hands on" with the "materials you're shaping".

lolive · 2026-02-04T10:21:22 1770200482

In my company, [enterprise IT] architects are separated into two kinds. People with a CV longer than my arm who know/anticipate everything that could fail and have reached a level of understandind that I personnally call "wisdom". And theorists, who read books and norms, who focus mostly on the nominal case, and have no idea [and no interest] in how the real world will be a hard brick wall that challenges each and every idea you invent.

Not being hands-on, and more important not LISTENING to the hands-on people and learning from them, is a massive issue in my surroundings.

So thinking hard on something is cool. But making it real is a whole different story.

Note: as Steve used to say, "real artists ship".

darepublic · 2026-02-04T09:33:31 1770197611

you think El Anatsui would concur that they didn't interact directly with what they were building? "hands on", "material you're shaping" is a metaphor

ibestvina · 2026-02-04T09:45:20 1770198320

I don't see why his involvement, explaining to his team how exactly to build a piece, is any different from a developer explaining to an LLM how to build a certain feature, when it comes to the level of "being hands on".

Obviously I am not comparing his final product with my code, I am simply pointing out how this metaphor is flawed. Having "workers" shape the material according to your plans does not reduce your agency.

skydhash · 2026-02-04T12:02:42 1770206562

> I don't see why his involvement, explaining to his team how exactly to build a piece, is any different from a developer explaining to an LLM

Because everyone under him knows that a mistake big enough is a quick way to unemployment or legal actions. So the whole team is pretty much aligned. A developer using an LLM may as well try to herd cats.

ibestvina · 2026-02-04T12:09:01 1770206941

First, that's quite a sad view of incentives structures. Second, you can't be serious in thinking that "worker worried they might be fired" puts the person in charge closer to the "materials" and more "hands on" with the project.

ibestvina · on Jan 28, 2024

Gates of Heaven (1978)

https://www.imdb.com/title/tt0077598/?ref_=ext_shr

ibestvina · on Dec 13, 2023

I always send myself messages as bookmarks, notrs, and general "here's an idea I might want to come back to later". It's an awful system when it comes to discoverability (I do it in Messenger so search is... bad), but I still do it as it's the most convinient option at the moment when I need to store that something somewhere.

So I built a simple Telegram bot which automatically stores anything I send as a text embedding into a vector database, and allows me to search over it in that same chat (same process that powers the AI Q&A assistants these days).

If I post a link, it automatically scrapes it and stores text as chunks for better search, extracts text from youtube videos (still wip), turns images into text with the visual models, etc.

One thing I'm unhappy about is not being able to easily edit any notes I search for later, but it's miles ahead of my previous "system". Hopefully I can open source this when I clean it up - if anyone is interested, let me know.

nbbaier · on Dec 13, 2023

Absolutely interested in this!

rozatoo · on Dec 14, 2023

Would love to have a look

ibestvina · on Dec 12, 2023

How do you look at hiring "experienced people" vs. "enthusiastic interns" on something like this? More generally, how quickly do you think the team will grow, and what the ratio should be between the "old" and the "young"?

eries · on Dec 12, 2023

Very hard to guess how it might all shake out. I would say that both Jeremy and I have an almost fanatical belief in the power of uncredentialed outsiders. So I would guess we will be more looking for curious open-minded generalists more than any specific age or experience level. I do expect we will grow headcount rather slowly, but that doesn’t mean we will launch infrequently

ibestvina · on Nov 16, 2023

Could you give an example of content where such critical distilation would be useful to you?

ibestvina · on Nov 6, 2023

Interested in this as well - do you already work on this somewhere out in the open?