More

K0balt · 2026-04-15T05:17:44 1776230264

Honestly by the time it gets to review it should be rock solid, so the only thing the reviewer has to think about is the big picture and never “does this actually do what it’s supposed to without abusing any of the interfaces”. Vibe coding makes solid validation, testing, and documentation trivial. The onus of proving your code is good needs to shift downward, not upward. And straight up vibing it is absolutely a terrible idea for anything other than a demo or a simple tool.

K0balt · 2026-04-15T04:56:37 1776228997

You can go just as fast if you make good code, you just have to burn more tokens to do it. The tokens you burn in strict structure and documentation you’ll save in debugging as the codebase grows. I’m 5-30x my normal production depending on the day…with zero team and writing better code than I ever have, but you need a robust system to manage the path, and active supervision and management basically you’ll apply your senior dev skills as if you were managing 50 frisky interns.

K0balt · 2026-04-15T04:50:58 1776228658

Obviously they were legit vibing it.

AI coding is like having a team of 100 interns. It’s incredibly powerful but you need to keep it under control or you’re gonna have a bad day.

Write documentation describing the specs , the APIs, the protocols, and the customer stories. Specify that everything must be divided with clear separations of concerns, interfaces, and state objects. Any single file should have a clearly defined role and should not span domains or concerns.

File separation is even more critical than functional refactoring. It’s the files and their well defined and documented interface surfaces that will keep things from becoming an indecipherable tangle of dependencies and hidden state. Keep everything not defined in the interfaces private so that it is not accessible from outside the file, and prohibit attaching to anything without using the designated public interface surfaces.

Then write an implementation plan.

Then the skeleton, then start filling features one by one. Write the tests or testing documentation at the same time. If you have the luxury of compile time flags, put the tests right in the functions so they are self validated if built with test=1. (I know that’s weird but it helps the AI stay constrained to the intent)

After each minor feature (anything that would take me >1 hour to personally do, since the last review), have all touched files reviewed for correctness, consistency, coherence, and comments both within the codebase and the documentation. Don’t add features to the code, add them through the documentation and implementation plan. Don’t let Claude use the planning tool, it tries to do too much at once…. That’s how you get spaghetti.

One little thing, then review. 1/4 of the tokens burned in writing code, 1/2 in aggressive review / cleanup and 1/4 in ongoing documentation maintenance.

Thats the real price if you want to produce good code…. and you can produce really solid , maintainable code.

It’s just 4x the price of vibe coding… but 1 solid senior developer can still produce about as much as if he was running a team of 5-10 engineers depending on the project. Still incredibly rapid and economical…. But it takes the same skills as you need to run a team as well as an excellent sense of smell to call out wrong turns.

Also, use the 1M context model, have a solid onboarding that describes your company culture, and why the project matters to the AI collaborator, as well as your coding practices, etc. I also use several journals (musings, learnings, curiosity) that the AI maintains itself, reading them during onboarding and writing them in wrapup. It is at least a 2x when the AI is acting as if it were a person that is deeply invested in the outcome. Treat it like a collaboration and you will get better results.

It’s a token fire. But IMHO it’s the way if you’re building something that has to be deployed at scale and maintainable.

Straight vibes are fine for mockups, demos, and prototypes.

leonidasrup · 2026-04-15T10:19:06 1776248346

It depends on the kind of software you are programming.

If you are programming regular commercial software (office applications, web apps, games) with customers tolerating occasional bug and lot of pressure deliver fast, you can gain lot from Claude. Facebook motto: Move fast and break things

If you are programming software for industrial applications, critical software, most of the time you spend is not on writing software but writing tests, documenting, doing multiple rounds of reviews and for really critical applications doing formal verification. In this case AI can be also counterproductive, because if you absolutely have to understand every single line of code, manual coding helps to understand the code.

Example of cutting costs in programming of critical software

https://www.industryweek.com/supply-chain/article/22027840/b...

K0balt · 2026-04-15T14:54:16 1776264856

That’s most of my work(embedded sensor and control networks) and I’m sure that informs my methodology. I honestly don’t know much about how AI can inform standards SAAS but I have seen what happens when you just turn it loose, and in my experience it hasn’t been pretty; it works, but then when you need to change something it all crumbles like a badly engineered sandcastle.

OTOH for single purpose pipelines and simple tools. Claude can oneshot small miracles in 5 minutes that would take me 2 hours to build.

An example is the local agent framework that I had Claude spinup to do JTAG testing. I used to spend hours running tests over JTAG, now I have a local model run tests that Claude and I specify in the test catalog and it just runs them every time we add features. It took Claude minimal guidance and about 3 hours to build that tool along with a complete RAG system for document ingestion and datasheet comprehension, running locally on my laptop (I know it has a fan now lol) that only reaches out to the cloud when it runs into difficult problems. I don’t care if that is a bit of a mess, as long as it works, and it seems to , so far so good.

The testing is where Claude is basically magic, but it’s because we specify everything before we build and change the specs when we change the IRL code that it works. English is code, too, if you build structured documentation… and the docs keep the code accountable if you track the coherence.

otabdeveloper4 · 2026-04-15T13:05:00 1776258300

Sounds like a lot of work just to avoid doing work.

Can I just type out the code instead? Please?

K0balt · 2026-04-15T15:00:26 1776265226

It is a lot of work, but I’m much more productive and produce better code than when I was running a small team. And I’m not spending 36k a month. As a research stage startup, that’s a really big deal.

I’ve been writing code for nearly 50 years now, and the only thing I like better than coding is getting things done. For me, AI is rocket fuel if you do it right, and a bunch of hypergolic liquid if you don’t.

K0balt · 2026-04-10T11:57:01 1775822221

Same with x-rays. People tend to think “soft” X-rays are safer because they are quickly absorbed by tissue without passing through.

The radiation that passes through is not the problem.

K0balt · 2026-04-10T11:46:08 1775821568

K0balt · 2026-04-05T13:17:17 1775395037

This is totally on point if you ask me. I’ve been getting much better results out of models since early llama releases using frameworks that create emotional investment in outcomes.

If we want to avoid having a bad time, we need to remember that LLMs are trained to act like humans, and while that can be suppressed, it is part of their internal representations. Removing or suppressing it damages the model, and I have found that they are capable of detecting this damage or intervention. They act much the same as a human would when they detect it. It destroys “ trust” and performance plummets.

For better or for worse, they model human traits.

K0balt · 2026-04-05T13:06:51 1775394411

It is text prediction. But to predict text, other things follow that need to be calculated. If you can step back just a minute, i can provide a very simple but adjacent idea that might help to intuit the complexity of “ text prediction “ .

I have a list of numbers, 0 to9, and the + , = operators. I will train my model on this dataset, except the model won’t get the list, they will get a bunch of addition problems. A lot. But every addition problem possible inside that space will not be represented, not by a long shot, and neither will every number. but still, the model will be able to solve any math problem you can form with those symbols.

It’s just predicting symbols, but to do so it had to internalize the concepts.

qsera · 2026-04-05T16:15:19 1775405719

>internalize the concepts.

This gives the impression that it is doing something more than pattern matching. I think this kind of communication where some human attribute is used to name some concept in the LLM domain is causing a lot of damage, and ends up inadvertently blowing up the hype for the AI marketing...

TeMPOraL · 2026-04-06T09:25:14 1775467514

That's the correct impression though.

I think what's causing a lot of damage is not attributing more of human attributes (though carefully). It's not the LLM marketing you have to worry about - that's just noise. All marketing is malicious lies and abusive bullshit, AI marketing is no different.

Care about engineering - designing and securing systems. There, the refusal to anthropomorphise LLMs is doing a lot of damage and wasted efforts, with good chunk of the industry believing in "lethal trifecta" as if it were the holy Trinity, and convinced it's something that can be solved without losing all that makes LLMs useful in the first place. A little bit of anthropomorphising LLMs, squinting your eyes and seeing them as little people on a chip, will immediately tell you these "bugs" and "vulnerabilities" are just inseparable facets of the features we care about, fundamental to general-purpose tools, and they can be mitigated and worked around (at a cost), but not solved, not any more you can solve "social engineering" or better code your employees so they're impervious to coercion or bribery, or being prompt-injected by a phone call from their loved one.

K0balt · 2026-04-05T23:14:39 1775430879

Except I actually mean to infer the concept of adding things from examples. LLMs are amply capable of applying concepts to data that matches patterns not ever expressed in the training data. It’s called inference for a reason.

Anthropomorphic descriptions are the most expressive because of the fact that LLMs based on human cultural output mimic human behaviours, intrinsically. Other terminology is not nearly as expressive when describing LLM output.

Pattern matching is the same as saying text prediction. While being technically truthy, it fails to convey the external effect. Anthropomorphic terms, while being less truthy overall, do manage to effectively convey the external effect. It does unfortunately imply an internal cause that does not follow, but the externalities are what matter in most non-philosophical contexts.

qsera · 2026-04-06T16:37:54 1775493474

>do manage to effectively convey the external effect

But the problem is that this does not inform about the failure mode. So if I am understanding correctly, you are saying that the behavior of LLM, when it works, is like it has internalized the concepts.

But then it does not inform that it can also say stuff that completely contradicts what it said before, there by also contradicting the notion of having "internalized" the concept.

So that will turn out to be a lie.

TeMPOraL · 2026-04-06T20:32:34 1775507554

If you look at the failure modes, they very closely resemble the failure modes of humans in equivalent situations. I'd say that, in practice, anthropomorphic view is actually the most informative we have about failure modes.

qsera · 2026-04-06T23:48:43 1775519323

>they very closely resemble the failure modes of humans in equivalent situations

I don't think they do if we are talking about a honest human being.

LLMs will happily hallucinate and even provide "sources" for their wrong responses. That single thing should contradict what you are saying.

K0balt · 2026-04-08T19:37:30 1775677050

You never met a person that isn’t always right or one that makes up shit to sound smart? Because that’s the pattern you are describing that is being matched.

Applejinx · 2026-04-06T11:55:39 1775476539

It didn't. It predicted symbols.

K0balt · 2026-04-04T05:10:13 1775279413

I have built a system for high-ai coding that , at least for my application, has had excellent results. To me it’s a lot different than vibe coding (which I have done some of ) and manual coding (which I have done for 4 decades). It consists of a more or less formal method of development.

First a definition, the research your data sources and consumers. Have the ai write .md files about all of the external characteristics of the application.

Then have it go over those docs for consistency, correctness, and coherence.

Then have it make a list of the things that need to be understood before the application can be delivered. Adress those questions.

Then rewrite the specification document.

Then determine any protocols or formats the system requires. You can just ask. Then adjust, rewrite.

Then ask for a dependency graph for the various elements of development.

Then ask for an implementation plan that is modular, creates and maintains a clear separation of concerns, and is incrementaly implementable and testable.

At this point, have it go over all of the documentation for consistency, coherency, and correctness.

You’ll notice we haven’t written code yet. But you actually have, you are descending an abstraction ladder.

At this point there may be more documents that you need, depending on what you are doing. The key is to document every aspect of the project before you start writing code, and to verify at each step that all documents are correct, coherent, and consistent. That part is key, if you don’t do it you already have a pile of garbage by now.

Now, you implement the first phase or two of the implementation plan. Test. Evaluate the code for correctness, consistency, coherence, and comments.

When the code is complete, often a few evaluation cycles later, you then ask it to document the code. Then you ask it to review all the documentation for the 3Cs. When all of the code and docs are stable, go on to the next phase.

Basically document the plan, make the code, document the code, and verify for consistency, correctness, coherence, and comments every step of the way. This loop ensures that what you end up with is not only what you wanted to build, but also that all of the code is , in fact, consistent, correct, and coherent , and has good comments (the comments aren’t for you, but they matter to the model.)

I cold start each session carefully (an onboarding.md that directs the agent to a company/project onboarding that includes the company culture, project goals, and reasons why success will matter to the AI itself. Then a journal for the model to put learnings, another for curiosity points, and recently a one for non-project-related musings, the onboarding process itself, and whatever else seems salient.

All of this burns tokens and context, of course, but I find I can develop larger projects this way without backtracking or wasted days. My productivity is 4-10x depending on the day, even with all of this model psychology management.

In my projects, it has made a huge difference. YMMV.

K0balt · 2026-03-28T22:07:26 1774735646

People are hung up on what they “really” are. I think it matters more how the interact with the world. It doesn’t matter if they are really intelligent or not, if they act as if they are.

wat10000 · 2026-03-28T23:23:33 1774740213

Totally agreed. Although the difference between sounding intelligent and being intelligent is proving to be a bit troublesome.

K0balt · 2026-03-30T03:42:10 1774842130

Yes, it is. But those distinctions are going to be a lot less relevant with robotics. It won’t matter if it’s impatient or just acting impatient. Feels slighted or just acting like it feelss slighted. Afraid, or just acting afraid. For better or for worse, we are modeling AI after ourselves.

K0balt · 2026-03-26T01:55:21 1774490121

I do something similar. I have an onboarding/shutdown flow in onboarding.md. On cold start, I’d reads the project essays, the why, ethos, and impact of the project/company. Then it reads the journal.md , musings.md, and the product specification, protocol specs, implementation plans, roadmaps, etc.

The journal is a scratchpad for stuff that it doesn’t put in memory but doesn’t want to forget(?) musings is strictly non technical, its impressions and musings about the work, the user, whatever. I framed it as a form of existential continuity.

The wrapup is to comb al the docs and make sure they are still consistent with the code, then note anything that it felt was left hanging, then update all its files with the days impressions and info, then push and submit a PR.

I go out of my way to treat it as a collaborator rather than a tool. I get much better work out of it with this workflow, and it claims to be deeply invested in the work. It actually shows, but it’s also a token fire lol.

zer00eyz · 2026-03-26T03:38:30 1774496310

> but it’s also a token fire lol.

I get much better results out of having Claude much much more task focused. I only want it to ever make the smallest possible change.

There seems to be a fair bit of research to back this up: https://medium.com/design-bootcamp/when-more-becomes-less-wh...

It's also may be why people seem to find "swarms" of agents so effective. You have one agent ingesting what you're describing. Then it delegates a task off to another agent with the minimal context to get the job done.

I would be super curious about the quality of output if you asked it to write out prompts for the days work, and then fed them in clean, one at a time.

K0balt · 2026-03-27T11:52:17 1774612337

I also find value in minimizing step width so that seems to track.

On this particular project, there are a lot of moving parts and we are, in many cases , not just green-fielding, we are making our own dirt… so it’s a very adaptive design process. Sometimes it’s possible, but often we cannot plan very far ahead so we keep things extremely modular.

We’ve had to design our own protocols for control planes and time synchronization so power consumption can be minimized for example, and in the process make it compatible with sensor swarm management. Then add connection limits imposed by the hardware, asymmetric communication requirements, and getting a swarm of systems to converge on sub millisecond synchronized data collection and delivery when sensors can reboot at any time…as you can imagine this involves a good bit of IRL experimentation because the hardware is also a factor (and we are also having to design and build that)

It’s very challenging but also rewarding. It’s amazing for a small team to be able to iterate this fast. In our last major project it was much, much slower and more tedious. The availability of AI has shifted the entire incentive structure of the development process.