The checkpoint pattern you describe is exactly right. I've been dealing with thi...

The checkpoint pattern you describe is exactly right. I've been dealing with this as well. Instead of vibe coding, it's vibe system engineering and I don't care for it. So I thought about it and came up with a framework to describe and reason about different pipelines. I based it on the types of LLM failures I was seeing in my own pipeline (omissions, incorrect, or inconsistent with existing stuff).

I wanted something I could use to objectively decide if one test (or gate, as I call them) is better than another, and how do they work as a holistic system.

My personal tool encodes a workflow that has stages and gates. The gates enforce handoff. Once I did this I went from ~73% first-pass approval to over 90% just by adding structured checks at stage boundaries.

My hope is that we can have a common vocabulary to talk about this, so I wrote up the data and the framework that fell out of it: https://michael.roth.rocks/research/trust-topology/