From Vibe Coding to Harness Engineering

It's been barely a year since Andrej Karpathy coined "vibe coding" and the entire discipline of AI-assisted software development has already gone through three distinct phases. The pace of change is remarkable. And if you're still thinking about AI coding the way you were in December, you're already behind.

A quick timeline

Early 2025: Vibe coding takes off. Karpathy described it as giving in to the vibes, letting AI generate code from natural language prompts and not worrying too much about understanding every line. It worked great for prototypes and MVPs. It also produced mountains of technical debt, inconsistent architecture and code that nobody could maintain at scale.

Mid-to-late 2025: Spec-driven development emerges. Teams figured out that the problem wasn't AI capability, it was the lack of upfront clarity. Tools like AWS Kiro and GitHub Spec Kit formalized this: write structured specifications first (requirements, design constraints, acceptance criteria), then hand them to agents for implementation. The spec became the source of truth, not the code. This was a real step forward. It brought engineering discipline back into an AI-native workflow.

February 2026: Harness engineering arrives. Mitchell Hashimoto gave it the name in a blog post about his AI adoption journey. Days later, OpenAI published a detailed account of building a million-line codebase with zero manually written code. Anthropic released research on effective harnesses for long-running agents. The term stuck because it described something practitioners were already doing but didn't have language for.

What is harness engineering?

Harness engineering is the discipline of designing the environment, constraints, feedback loops and tooling that keep AI agents productive and on track. It's not about writing better prompts or even better specs. It's about building the system around the agent so it can do reliable work autonomously.

Think of it this way. If prompt engineering is telling the horse where to go, and context engineering is giving it a map and road signs, harness engineering is building the saddle, the reins, the fences and the road itself so ten horses can run safely at the same time.

The core idea, as Hashimoto put it: every time an agent makes a mistake, you engineer a solution so it never makes that mistake again. That compounding effect is what makes harness engineering so powerful. Every improvement applies to every future agent run.

Patterns that are working

Across organizations experimenting with this approach, a few common patterns keep showing up.

Agent-readable documentation as system of record. AGENTS.md and CLAUDE.md files at the project root that contain build commands, coding rules, architectural constraints and forbidden patterns. These aren't for humans. They're deterministic instructions injected into the agent's context at runtime. If knowledge lives in Slack threads or someone's head, it doesn't exist to the agent.

Enforced architectural constraints. Custom linters, structural tests and strict dependency rules that agents can't violate. OpenAI's team enforced a rigid layered architecture where each domain had fixed dependencies and interfaces. The constraints are what allow speed without decay.

Garbage collection agents. Background agents that periodically scan for documentation drift, architectural violations and accumulated cruft. OpenAI's team used to spend every Friday manually cleaning up AI-generated slop. That didn't scale. Automating the cleanup did.

Separation of planning and execution. This carries forward from spec-driven development but gets more formalized. The human specifies intent and reviews plans. The agent executes. Mixing the two is where things go sideways.

Implications for organizations

We've come a long way since December. The last three months have seen improvements in agent capability and reliability at a level we haven't seen before. OpenAI's team averaged 3.5 merged PRs per engineer per day across a million-line codebase. That's not a toy demo. That's production software with real users.

The takeaway is simple: if you aren't already moving toward an agentic SDLC, it's time. Not next quarter. Now. The organizations seeing the biggest gains aren't the ones with the best models. They're the ones investing in the harness: the documentation, the architectural guardrails, the feedback loops and the verification systems that make agents reliable at scale.

Vibe coding got us started. Spec-driven development added discipline. Harness engineering is where we build the infrastructure for AI to do serious, sustained, trustworthy work. The engineer's job isn't disappearing. It's shifting from writing code to designing the environments where agents write it well.

Join the discussion on LinkedIn.