Beyond vibe coding: why chat needs verification at every stage

AI agents amplify whatever process you give them. Weak process produces slop faster. Strong process produces quality faster. Same tool, different outcomes. The difference is verification.

I’m watching two patterns emerge as teams adopt AI agents for development. One pattern: chat with an agent, “build me authentication,” agent generates code, developer reviews in chat, “looks good,” ships to production, discovers security issues and performance problems. Fast output, declining quality. The “Beyond the Vibes” problem at scale. The other pattern: work happens in stages (requirements, design, tests, code), chat used at every stage but never alone, tools verify, humans approve, PR gates between stages. Slower per feature, quality maintained, speed compounds over time.

Both teams use chat constantly. One skips verification, one builds it at every stage. One produces slop faster than manual coding could. One produces quality faster than manual coding could. AI didn’t change whether verification matters. It amplified how much it matters.

The surface-level assumption

The prevailing view of AI-generated slop focuses on code. AI produces buggy implementations, insecure patterns, low-quality code. The solution, according to this view, is better code review, more testing, stricter linters. Focus verification at the code stage. Catch the problems in the generated implementation.

What verified progression reveals is different. Hallucinations and mistakes happen at every stage: requirements, design, tests, code. The staged process catches them wherever they occur. Requirements mistakes caught before design. Design mistakes caught before tests. Test mistakes caught before implementation. Implementation mistakes caught before deployment. Each stage costs less to fix than the next.

This matters because vibe coding jumps straight to code without verified requirements or design. All problems (requirements, design, tests, implementation) discovered simultaneously in production. Expensive, wasteful, produces slop. The pattern verified progression enables is different: problems caught at the stage where they occur. With verification at each stage, caught early, cheap to fix (hours not days). Without staged verification, discovered in production (days to fix, or incidents).

How verified progression works

A real example makes this concrete. A team building a domain API to orchestrate three backend services. Gateway routing to Apache Camel orchestrator, coordinating three existing backend systems (validation, enrichment, storage). Running on Kubernetes, needs to handle production traffic patterns.

Requirements stage

Chat coordinates the exploration. “What are we solving? Need unified API orchestrating three backend systems. Who are the users? External clients. What constraints matter? Latency targets, idempotency, failure handling.” The agent explores edge cases (idempotency, sparse fieldsets, retrieval patterns), documents requirements in spec.md, surfaces assumptions about backend responsibilities.

Tools verify: requirements completeness checks run, stakeholder review happens, conflict detection with existing platform APIs. The human approves: architect reads the spec, verifies business need, confirms feasibility, spots gaps in error handling requirements. PR gate: spec.md approved and locked before design starts.

What this caught: missing requirement for handling cases where identifier from one backend isn’t immediately available. Caught at requirements stage, not after designing orchestration flow. Cost if missed: would have discovered during implementation. Days to redesign orchestration versus hours to update spec.

Design stage

Chat coordinates architectural exploration. “Given requirements, what’s the orchestration approach? Sequential versus parallel? Who owns what data? What are failure modes when backends timeout or return errors?” The agent drafts plan.md, explores alternatives, documents system of record responsibilities (validation rules, reference data, persistence and idempotency), maps failure modes (timeout equals 503 with retry hints, validation failure equals 400 with detail, enrichment failure degrades gracefully).

Tools verify: AI review spots architectural issues (sequential calls would add latency), pattern consistency checks flag deviation from existing patterns, dependency analysis shows orchestration dependencies. The human approves: lead architect reviews plan, verifies against locked spec, checks fit with broader strategy, pushes back on agent’s sequential approach, chooses parallel enrichment where possible.

PR gate: plan.md and architecture approved and locked before tests.

What this caught: agent suggested simpler sequential orchestration that would have added 500ms+ latency. Architectural review caught parallel opportunity before implementation. Cost if missed: would have discovered in performance testing. Week to refactor versus hours to revise plan.

Test stage

Chat coordinates test generation. “Given this design, generate acceptance test cases. Cover submission flow, retrieval patterns, idempotency, error handling, sparse fieldsets.” Agent generates comprehensive test suite: happy path, error cases, edge cases, orchestration boundaries.

Tools verify: requirements coverage (all requirements have tests), spec coverage (critical design elements covered), test quality analysis flags brittle tests. The human approves: senior developer reviews tests, verifies they match locked plan, spots mechanical tests (checking implementation details, not behavior), rewrites three tests to verify from client perspective.

PR gate: tests approved and locked before implementation.

What this caught: tests that would have passed but didn’t verify parallel enrichment actually worked (they mocked the orchestrator instead of the backends). Human spotted the gap tools missed. Cost if missed: false confidence. Would have passed all tests but failed in production. Days to debug versus hours to fix tests.

Code stage

Chat coordinates implementation. “Implement orchestrator to pass these tests. Use locked design: parallel enrichment where possible, sequential where dependencies require it.” Agent has everything: verified requirements (spec.md), approved approach (plan.md), concrete tests. Implementation happens overnight.

Tools verify: all tests pass, linters clean, security scan clean, OpenAPI validation confirms implementation matches spec. The human approves: developer examines implementation, verifies locked plan, confirms tests pass, spot-checks error handling. Finds edge case where backend timeout returned generic 500 instead of 503 with retry hints.

PR gate: code approved and merged after error handling fix.

What this caught: error handling gap where backend failures weren’t giving clients actionable information. Tools missed it, human spotted it reviewing error paths. Cost if missed: production incident. Customers seeing unhelpful errors, operations scrambling to diagnose.

The economics of staged verification

The cost structure is clear. Early stages (requirements, design): cheap to fix. Hours to rewrite docs, not code. Late stages (code, production): expensive to fix. Days to refactor, or production incidents. Skipping early verification means all problems hit simultaneously at the most expensive stage.

Verified progression enables several things. Each stage builds verified artifacts for the next. By code stage, you have locked requirements, approved design, concrete tests. More good conversations happen early, requirements and design discussions with domain experts who can shape the work before implementation costs accumulate. Learnings feed back: patterns that worked, failure modes discovered, architectural decisions documented for next time. It requires strong context at each stage. System knowledge, domain understanding, integration patterns. The reviewer needs to know enough to spot problems.

Vibe coding fails for predictable reasons. It skips straight to code with unclear requirements, uncertain design, weak tests. Fast output becomes slow when you count the rework. It amplifies process weakness: bad requirements become bad code, faster. Verified progression works because it amplifies process strength: verified requirements become verified design, verified tests, good code. Faster.

The supervision paradox

Everyone uses chat. Some verify at every stage, others don’t. Same tool, dramatically different outcomes.

Verification at each stage requires judgment. Is this requirement right? Is this design sound? Are these tests meaningful? Juniors and new hires are supervising agents before they’ve built that judgment and context. Can they verify requirements they haven’t learned to write? Can they spot flawed designs they haven’t learned to create? Can they identify mechanical tests when they haven’t learned what good tests look like?

“The more we learn, the more we realize how little we know.” People deeply using AI realize how complex verification is. Module boundaries, integration assumptions, business context that doesn’t live in code. Meanwhile juniors are supervising before they’ve learned enough to realize how little they know.

Some experienced developers dismiss AI entirely after encountering vibe coding stories and “Beyond the Vibes” discourse on social media. They should know better than to discard something existential without deep evaluation. The volume of dismissal might reveal they sense the threat. This isn’t mobile, there’s nowhere to hide. Every domain, every role, every specialization is in scope. But dismissing based on anxiety instead of engaging deeply is exactly the wrong response. Developers using verified progression can supervise more work than developers rejecting AI entirely. The gap compounds.

Chat didn’t make verification optional. It made verification essential at every stage. Vibe coding is what happens when speed feels more important than rigor, until the rework costs more than the speed gained. AI amplifies process quality: weak process produces slop faster, strong process produces quality faster. The choice isn’t whether to use AI. The choice is whether to verify.

Techcle Wiki

Explorer