I've set a few rules for working with coding agents:
1. If I use a coding agent to generate code, it should be something I am absolutely confident I can code correctly myself given the time (gun to my head test).
2. If it isn't, I can't move on until I completely understand what it is that has been generated, such that I would be able to recreate it myself.
3. I can create debt (I believe this is being called Cognitive Debt) by breaking rule 2, but it must be paid in full for me to declare a project complete.
Accumulating debt increases the chances that code I generate afterwards is of lower quality, and it also feels like the debt is compounding.
I'm also not really sure how these rules scale to serious projects. So far I've only been applying these to my personal projects. It's been a real joy to use agents this way though. I've been learning a lot, and I end up with a codebase that I understand to a comfortable level.
While this is a legitimate set of rules to follow for maintaining code sanity and a solid mental model of how a codebase may grow, it’s always challenging to stick to them in a workplace where expectations around delivery speed have changed drastically with the onset of AI. The sweet spot lies in striking a balance between staying connected to the codebase and not becoming a limiting factor for the team at the same time.
That's kind of what I figured, sadly. I haven't experienced it personally yet since I got let go from my last job about 14 months ago, but it makes so much sense given how management is so willing to sacrifice quality for speed.
Another frustrating thing that has emerged from this is where managers “vibe code” half-baked ideas for a couple of hours and then hand it off as if they’ve meaningfully contributed to the implementation. Suddenly you’re expected to reverse engineer incoherent prompts, inconsistent code, and random abstractions that nobody fully understands.
In their mind they’ve already done the “architectural heavy lifting” and accelerated the team.
More often than not it just adds cognitive overhead where you spend more time deciphering and cleaning up garbage than actually building the thing properly from scratch.
Vouching for this comment because my friend confided in me a week ago that her manager also does this and is like “oh yeah, here’s 80% done, you just do the rest so we can ship it” when a large part of it is slop that needs to be rewritten, due to not enough guidance and pushback during generation.
Writing tests against a bad implementation usually doesn't work well. In this scenario I would have an LLM look at the changes in the branch and try to create a markdown document of the changes, why it thinks they were made, etc. and then review that doc with the manager and do a new implementation from scratch after aligning.
Unless the tests are written against logic that is in of itself subtly wrong and even the structure of code and what methods there are is wrong - so even your unit tests would have to be rewritten because the units are structured badly.
It’s a valid direction to look in, it just doesn’t address the root issue of throwing slop across the wall and also having unrealistic expectations due to not knowing any better.
Yep. It’s very healthy to be suspicious of code. Any code. Whether generated or not. That’s where the bugs are.
If there’s one thing that’s disturbing with AI proponent is how trusting they are of code. One change in the business domain and most of the code may have turn from useful to actively harmful. Which you have to rewrite. Good luck doing that well if you’re not really familiar with the code.
I am lucky to have never worked in a team where my manager wouldn't expect strong push back in this scenario. Many of the corporate environments described on here seem dystopian, this included.
To make a bit of a counter argument here - it's really hard to stick to 100% quality at speed 1.0, when your opponent argues for 90% at 2.5. That's the story the AI fast-movers are telling, and from a business perspective, it's hard to counter it (regardless of whether that speed increase actually materialises).
I was trying to follow similar rules, until one day I had to solve a hard mathematical problem. Claude is a phd level mathematician, I am not. I, however, know exactly the properties of the desired solution and how to test it’s correct. So I decided to keep Claude’s solution over my basic, naive one. I mentioned that in the pull request and everyone agreed that was the right call. Would you open exceptions like that in your rules? What if AI becomes so much better at coding than you , not just at doing advanced mathematics? Would you then stop to write code by hand completely since that would be the less optimal option, despite you losing your ability to judge the code directly at that point (and as in my example, you can still judge tests, hopefully)? I think these are the more interesting questions right now.
Unfortunately, it is not, and many of its attempts at mathematical proofs have major flaws. You shouldn't trust its proofs unless you are already able to evaluate them--which I think is pretty much all the OP is saying.
Trust isn’t a binary, and I can trust things I don’t understand enough that I can use them. OP was talking about needing to understand, which is quite a bit above the level of being able to validate enough to use for a task.
I definitely wouldn't put math in my code I didn't understand just because Claude says so. I am not astonished that everyone agreed, that's why shit is going to hit the fan pretty badly pretty soon due to AI coding.
There is one exception to this: If the AI also delivers the proof of why the math is correct, in a machine-checked format, and I understand the correctness theorem (not necessarily its proof). Then I would use it without hesitation.
I always found it weird when helping people with excel formulas how few people even try to check maths they don't understand, let alone try to understand it.
I struggle to remember even relatively simple maths like working out "what percentage of X is Y" so if I write a formula like that I'll put in some simple values like 12 and 6 or 10,000 and 2,456 just to confirm I haven't got the values backwards or something. I've been shown sheets where someone put a formula in that they don't understand, checked it with numbers they can't easily eyeball and just assumed it was right as it's roughly in their ball park / they had no idea what the end result should be.
Then again I've also seen sheets where a 10% discount column always had a larger number than the standard price so even obviously wrong things aren't always checked.
I don't disagree, but whoever never put math they don't fully understand in their code gets to throw the first stone.
I've reached solutions by trial and error too, and tried to rationalize them later, quite a few times. And it's easier to rationalize a working solution, however adversarial you claim to be in your rationalization.
I don't see using gen AI for the (not so) “brute force” exploration of the solution space as that different from trial and error and post fact rationalization.
How did you test that the solution is correct? Is the set of possible inputs a low-ish finite number?
Normally with mathematical problems you have to prove the solution correct. Testing is not sufficient, unless you can test all possible inputs exhaustively.
You do realize you can ask Claude about the things you don't understand?
"PhD level" just means you finished a bachelor and masters degree and are now doing a bit of original research as an employed research assistant.
Claude isn't "PhD level" anything. This shows a complete lack of understanding here. Claude has read every single text book in existence, so it can surface knowledge locked away in book chapters that people haven't read in years (nobody really reads those dense books on niche topics from start to finish).
Since Claude has infinite patience, you can just keep asking until you get it.
Man people really overestimate training. Claude did not 'read' any of that either. I wish frontier models behaved like people that had read and remembered everything they've trained on, but they're not.
I’ve also heard it being called “comprehension debt,” which I like a little more because I think it’s more precise: the specific debt being accrued is exactly a lack of comprehension of the code.
Comprehension debt just sounds like there are things you don’t (yet) understand.
Cognition debt means your lack of understanding compounds and the cognition “space” required to clear it increases accordingly.
An increasing comprehension debt that can be paid off one bit at a time within reasonable cognition space takes linear time to clear.
Cognition debt takes exponential time to clear the more of it you have. If it reaches a point where you simply don’t have the space for the cognition overhead required to understand the problem, you probably need to start over from your specifications.
I like that too. However, “cognitive debt” points to the possibility of cognitive overload, that the code can become so complex and inscrutable that it may become impossible to comprehend. “Comprehension debt” sounds a bit weaker in that respect, that it’s just a matter of catching up with one’s comprehension.
This is fine if it’s more enjoyable for you, that’s what’s important in personal projects most of the time.
But we don’t follow the same things for dependencies, work of colleagues, external services, all the layers down to the silicon when trying to work.
Why is AI suddenly different?
We just have to do this by risk and reward. What’s the downside if it’s wrong, and how likely is an error to be found in testing and review? What is the benefit gained if it’s all fine? This is the same for libraries and external services.
A complex financial set of rules in a non-updatable crypto contract with no testing?
A viewer for your internal log data to visualise something?
It is and has always been immensely helpful to understand what you are doing in any context.
There are some programmers who treat the job as just plumbing together what is to them completely incomprehensible black boxes, who treat the computer as a mystery machine that just does things "somehow", but these programmers will almost always be hacks that spend their entire career producing mediocre code.
There are things such a programmer can build, but they are very limited by their lack of in depth understanding, and it is only a tiny fraction of what a more competent programmer can put together.
To get beyond being a hack, you need to understand the entire stack, including the code that you didn't write, including both libraries, frameworks and the OS, and including the hardware, the networking layers, and so forth. You don't have to be an expert at these things by any means, but you do need to understand them and be comfortable treating them as transparent boxes that you may have to go in and fiddle with at some point to get where you need to go. Sometimes you need to vendor a dependency and change it. Sometimes you need to drop it entirely and replace it with something more fit for purpose you built yourself.
> To get beyond being a hack, you need to understand the entire stack, including the code that you didn't write, including both libraries, frameworks and the OS, and including the hardware, the networking layers, and so forth.
I think maybe you overestimate your own knowledge here. It's one thing to understand general principles and design or to understand a contextually-relevant vertical or whatever. It's another to demand comprehensive (even if not expert) familiarity in non-trivial projects, especially those created by many developers over long time spans. It's not just a question of intelligence or dedication or even just time spend working on a project.
The amount of software even your typical piece of code relies on is staggering and shifting, and it's only getting more complicated. A good chunk of software engineering and programming language research has been focused on making it practical to operate in such an complex environment - an environment that nobody fully understands - which is a major part of why modularity exists. Making software like "plumbing together [...] black boxes" is exactly what such research aspired to accomplish, because it allows different developers to focus on different scopes and focus on the domain they're working on. Software engineering is a practical field, and any system that requires full knowledge to operate, modify, and extend is either relatively small (maybe greenfield and written by a sole developer) or impractical to work with.
So I would say there's a wide gap between "lazy guy who doesn't give a shit" and "guy who thinks he can understand everything". Both lack the humility and wisdom needed to know the limits of their knowledge, to circumscribe what he needs to understand, and to operate within the space these afford. (Both extremes remind me of cocky junior devs. On the one hand, you have the junior dev who carelessly churns out "hot shit" garbage code by plumbing things together with no grasp or appreciation of sound design; on the other you have the dev who makes a big show about "rigor" completely detached from the actual realities and needs of the project. In each case, the dev is failing to engage intelligently with the subject matter.)
Well I mean I've built an internet search engine from scratch[1] and I'm making a living off this successfully enough to have completely left the wagie existence for the foreseeable future, so I think I at least kinda walk the talk.
I'm far from the best at anything and make no claims toward knowing everything, but I do think I have reasonable breadth in my experience and work, and I don't think I could have built something like this otherwise.
[1] ... which is something that does not decompose neatly into black boxes and must to a large degree be built from first principles as goddamn nothing off the shelves scales well enough to deal with multi-terabyte workloads at the even a fraction of the speed a bespoke solution can.
ok, I kind of knew that already LOL, but I don't have any questions that are more specific so I can't really complain. just gotta get after it i guess.
Yeah I don't know if there is such a thing as good advice in this regard, except the stuff that everyone is saying.
I guess "build something you want", the Temu-bought knockoff of the previous advice. It's not quite as bad advice as it sounds, as it's at least some validation of an idea, and much easier than playing 17D chess trying to predict the zeitgeist.
The luck surface area[1] also isn't quite as talked about as it should, but a good mental model if you're seeking serendipitous life-changing outcomes that I can get behind.
AI is different because it's a tool, and the user of the tool is responsible for the work performed.
An outsourced developer isn't a "tool". They're a human being, and responsible for their actions. They're being paid, and they either act responsibly or they get replaced.
A vibe coder is a human using a tool. The human is responsible for code quality, and if it's not good enough, they need to keep using the tool to make it better. That means understanding the tool's output.
If an artist used Photoshop to create a billboard ad that was ugly, they don't get to blame Photoshop. They have to keep using the tool until their output is good.
I don’t find “but who is to blame, ultimately?” All that useful.
So you figure out that someone you paid is at fault, instead of someone they hired. Your contract is with them so what really changes? What process or anything else is really different between it being a company with a manager who asks a team of devs and a company which asks an AI agent, to you as a customer?
Maybe it changes who gets fired or sued or whether one insurance or another pays out- but broadly I think none of what I said about project work really changes.
Product owners and hell even customers have been able to get software they don’t understand all the details of or for customers even get to see the code, purely driven with natural language.
I'd think that depends on the model of responsibility at play.
For example, suppose I hire a building contractor to build a house, and the electrician he subcontracts makes mistake.
From my perspective, the prime contractor is equally responsible for that mistake regardless of whether he used a subcontractor, or did the work himself but used a broken tool.
This doesn't make the electrician any less of a "person" in the deeply important ways, but it's not a distinction that's relevant to my handling of the problem.
I had a similar approach, but in the end I don't think it's feasible to actually sufficiently follow the rule 2. It sounds good in theory, but in practice you'll always take some mental shortcuts that you may not even be aware of. Try digging into an unknown codebase to fix some issue and compare how much will stay in your head a week after if you do it yourself or if you "completely understand" what an agent did for you. When I do it myself, it contributes to my general knowledge and I mostly retain the important parts in my head even if I lose the details over time; when I try to own what an agent did as if it was mine, it feels like I understand it well at the time after putting some effort into it but then I forget it all very fast. Ultimately I decided that having an LLM help me there is actually detrimental to my goals most of the time, and that's without even considering some other concerns raised by sibling comments here such as time and business pressures.
This is great until the "gun to your head" is your skip-level manager demanding that a feature be implemented by the end of the week, and they know you can just "generate it with AI" so that timeline is actually realistic now whereas two years ago it would have required careful planning, testing, and execution.
Your manager is unknowingly helping you create a form of job security for yourself, with all the technical debt and bugs being accumulated.
He might not understand it, and it might not be the type of work you want to do, but someone is going to have to fix those issues. And the longer they wait, the bigger the task gets.
That isn't new, though. Managers often pushed unrealistic timelines and showed lack of care about tech debt well before vibe coding, just the timelines where different, and the magnitude will be bigger this time. But we also have LLMs to help it clean it up faster, I guess.
The bet that management is making is that the AI will continue to improve and that it will be able to fix those issues on the cheap - so far this has proven to be true for us. We use AI to generate code at scale, that code has issues at scale, so we use AI to fix those issues.
The question is, is it a job you actually still want once the poo pile reaches critical mass you are the only one with a shovel and the deadline is "yesterday"
That is absolutely true. Unfortunately, this ship has sailed and we are not closing Pandora's box anymore. We'll have to adapt.
But we still hold good cards in hand.
Do they want their pile of steaming slop fixed, or not? Because no amount of complaints about the deadline being "yesterday" are going to change anything about the fact that time will be needed to fix the accrued technical debt, whether they like it or not.. And if AI dug you in that deep to start with, the solution is not to dig deeper.
I suspect some companies are going to find that out the hard (costly) way.
If the manager is unreasonable, you were always going to have a problem with them, eventually. Nothing you can do with fix this.
If manage is reasonable, you can explain to them that there isn't time to check the work of the AI, and that it frequently makes obscure mistakes that need to be properly checked, and that takes time.
At this point, if they still insist you just give it the AI's work, they've made a decision that is their fault. You've done what you can.
And when the shit hits the fan, we're back to whether they're reasonable or not. If they are, you explained what could happen and it did. If they force responsibility on you, they aren't reasonable and were never going to listen to you. That time bomb was always going to go off.
The problem is that this mode of operation for them works - they get the features made in a fraction of the time it used to take, the feature does what it says on the tin, they feel good about pushing the product in a specific direction. If something goes wrong, the AI can fix it, too.
I'm not sure that there's really a "bomb" hiding in here anywhere. The issue is that it IS "reasonable" now to expect big features to be done within a week.
Does that matter that much in practice? I bet lots of costumers are okay with software that crashes 10x as much if it costs 10x less. There already is a ton of shitty software that still sells.
I agree to this though it also depends on the nature of project.
Had a project idea which I coded with the help of AI and it became quite large to a point I was starting to have uncharted areas in the code. Mostly because I reviewed it too shallow or moved fast.
It was a good thing as that project never floated but if I were to do such a thing on my breadwinning project I would lose the joy.
I just had a Claude episode. Instead of trying to fix the bug, it edited the data to hide the bug in the sample run. This kind of BS behavior is not rare. Absolutely, if you do not understand every bit of what's going on, you end up with a pile of BS.
I often break these rules for one specific aspect of my personal projects: if it has a web frontend, I don't want to know what kind of CSS magic the agent used to make it look as it looks. I'll happily accept whatever unmaintainable AI slop it produces, because I don't want to spend any time figuring out if my understanding of flexbox (or anything else related) is wrong again, and why.
This is about how I use it. I initially use it to carve out an architecture and iterate through various options. That saves a lot of time for me having to iterate through different language features and approaches. Once I get that, I have it scaffold out, and I go in and tidy things up to my personal liking and standards. From there, I start iterating through implementations. I generally have been implementing stuff myself, but I've gotten better at scaffolding out functions/methods through code instead of text. Then I ask it to finish things off. That falls into your first category of letting it implement stuff that I already know I could do. Not sure if it's faster. But it's lower cognitive load for me, since I can start thinking about the next steps without being concerned about straightforward code.
This all works pretty great. Where it starts going off the rails is if I let it use a library I'm not >=90% comfortable with. That's a good use of these tools, but if I let it plow through feature requests, I end up accumulating debt, as you pointed out.
For my uses, I'm still finding the right balance. I'm not terribly sure it makes me faster. What I do think it helps with is longer focused sections because my cognitive load is being reduced. So I can get more done but not necessarily faster in the traditional sense. It's more that I can keep up momentum easier, which does deliver more over time.
I'm interested in multi agent systems, but I'm still not sure of the right orchestration pattern. These AI tools still can go off the rails real quick.
It's not worth fighting it at work. If the idiots you work for want everything vibe coded and delivered at 5 * 2025 speed then just vibe code and try to leave the company ASAP. That's where I am right now. Of course I might end up somewhere just as ridiculous or maybe not be able to even find another job. Shitty times we live in right now.
You’re going to be the least productive developer in any work setting from this point on. There are people checking in 50k lines of solid TDD verified, non bloat, instrument performance checked feature code per day. Your 200 lines isn’t going to cut it for very long.
That may indeed look like quite the speed-up. But the accumulated errors and entropy in such an enterprise will eventually cause a cave-in, at which point the productivity metrics don't look so good.
this guy is resigned to feel worthless compared to other mathematicians (suggesting he become cannon fodder in some type of mathematical sacrifice. i wonder if that analogy even makes sense in the field xD).
but, he desperately wants to become a great mathematician who creates completely original work.
from my experience, people tend to or even want to limit themselves. they think they know the ceiling of their capabilities and it becomes some self fulfilling prophecy.
if you really care about doing something great like this guy does, don't limit yourself. push until you achieve the greatness you want to achieve.
it's like that one saying, aim for the stars and you might land on a cloud. you will be surprised at how capable you actually are
this discussion is so stupid. no one who isn't a moron is offloading all work and thought to LLMs. no one who isn't a moron is seriously afraid of their thinking and learning skill "atrophying", whatever tf that means.
it's clear that LLMs are unique in that you actually do have the capability to turn your brain off and blindly trust whatever it does for you. but it should be equally clear that that's a stupid approach. people will still use their minds, and this use gets empowered with proper use of LLMs. it's that simple. ffs, we take the fact that they pass the Turing Test routinely for granted now. let's not forget that this technology is legitimately incredible. it stands to reason that you are seriously handicapping yourself by not trying to use it.
1. If I use a coding agent to generate code, it should be something I am absolutely confident I can code correctly myself given the time (gun to my head test).
2. If it isn't, I can't move on until I completely understand what it is that has been generated, such that I would be able to recreate it myself.
3. I can create debt (I believe this is being called Cognitive Debt) by breaking rule 2, but it must be paid in full for me to declare a project complete.
Accumulating debt increases the chances that code I generate afterwards is of lower quality, and it also feels like the debt is compounding.
I'm also not really sure how these rules scale to serious projects. So far I've only been applying these to my personal projects. It's been a real joy to use agents this way though. I've been learning a lot, and I end up with a codebase that I understand to a comfortable level.
reply