If AI assisted development is so amazing, why aren't we seeing the gains?

The tools work. The gains don't show up. That's the uncomfortable reality of AI-assisted software development in the enterprise right now.

In February, the National Bureau of Economic Research published a study of 6,000 executives across the US, UK, Germany and Australia. The finding: roughly 90% of firms report that AI has had zero measurable impact on productivity or employment over the past three years. Apollo's chief economist Torsten Slok put it simply: "AI is everywhere except in the incoming macroeconomic data."

Economists have a name for this. In 1987, Nobel laureate Robert Solow observed that despite massive investment in information technology, productivity growth had actually slowed. That became known as Solow's productivity paradox. Almost four decades later, we're watching it happen again with AI.

McKinsey's 2025 State of AI survey tells the same story from the enterprise side: 88% of organizations use AI in at least one business function, but only about a third have scaled beyond pilots. Just 7% report full enterprise-wide scaling.

Google's 2024 DORA report found that a 25% increase in AI adoption correlated with a 1.5% drop in delivery throughput and a 7.2% drop in delivery stability. Developers feel faster. The systems they ship into are not.

Power tools don't teach you the trade

Here's an analogy that anyone who's assembled IKEA furniture by hand can relate to. The first time you build a bookshelf, you're reading instructions, figuring out which piece goes where, second-guessing every step. Give that person a power drill and they won't finish faster. They'll just strip screws faster.

But if you've built ten bookshelves? A power drill changes your life. You know what goes where. The tool accelerates execution because the knowledge is already there.

A lot of people building software with AI tools today have never truly built software. They're learning architecture, dependency management and production operations in real time, with an AI agent that's happy to generate code but has no opinion on whether the design makes sense.

The output looks like software. It compiles. It might even pass tests. But it doesn't hold up under load, under change, or under the scrutiny of a production environment.

METR's 2025 study captured this vividly: experienced open-source developers using AI tools were actually 19% slower on their own repositories, even though they believed they were 20% faster. That's a 39-percentage-point gap between perception and reality.

In February 2026, METR released an update. Their new data now suggests developers may be getting a genuine speedup, possibly around 18%. But they had to abandon their own methodology because too many developers refused to participate without AI tools.

The tools have become so embedded that people can't imagine working without them, whether or not they're actually faster. The efficiency illusion runs deep. For people without that baseline experience, the illusion is even more dangerous because there's no instinct telling them something is off.

The bottleneck just moves

Even when AI does make individual developers faster, the gains often evaporate at the organizational level. Faros AI analyzed telemetry from over 10,000 developers and found that AI-assisted engineers merge 98% more pull requests.

Sounds incredible. But PR review time increased 91%. The bottleneck didn't disappear. It moved downstream. This is what Amdahl's Law predicts: a system moves only as fast as its slowest component. AI accelerates code generation, but code generation was never the constraint.

Review, testing, deployment, alignment, decision-making: those are the real bottlenecks and AI doesn't touch them. Faster typing doesn't fix a three-week approval process.

Klarna learned this the hard way. In 2024, the Swedish fintech replaced 700 customer service agents with an AI chatbot built on OpenAI. They bragged about the savings publicly.

By mid-2025, CEO Sebastian Siemiatkowski admitted the company had gone too far, that the focus on cost and efficiency had produced lower quality. Klarna started rehiring humans. They're not alone: Forrester found that 55% of companies that executed AI-driven layoffs now regret the decision.

The pattern repeats: the tool does the task, but the task was never the whole job.

The old guard isn't wrong to be skeptical

On the other end, veteran engineers who've been writing production software for decades are watching this unfold with suspicion. Some of it is fear of replacement. Most of it is pattern recognition.

I've seen this before. Years ago I led a digital transformation where we replaced green-screen terminals with a modern development stack. I had developers who hated me for months.

It took over a year for some of them to come around and admit the new environment was genuinely better. The resistance wasn't irrational. It was the cost of unlearning decades of muscle memory and trusting that something unfamiliar could actually work.

The same dynamic is playing out now. JetBrains found that 48% of developers prefer to stay hands-on for core tasks like testing and code review. Stack Overflow's 2025 survey shows 46% of developers actively distrust AI tool output.

ManpowerGroup's 2026 Global Talent Barometer found that while regular AI use increased 13% in 2025, confidence in the technology's utility dropped 18%. People are using it more and trusting it less.

These aren't luddites. They're practitioners who've seen enough hype cycles to know that speed without reliability is a liability, not a feature.

Enterprises aren't built for this

Here's the structural problem. The organizations seeing the biggest gains from AI-assisted development are small teams with deep domain knowledge. Solopreneurs, startups, two-person shops building focused products.

They can move fast because they own the whole problem: requirements, architecture, implementation, deployment.

Google is the large-company exception that proves the rule. They went from 25% AI-generated code in late 2024 to roughly 50% by early 2026. Sundar Pichai reported a 10% engineering velocity gain across the company.

But Google has something most enterprises don't: tens of thousands of engineers with deep expertise in their own codebase, mature review infrastructure and the organizational DNA to absorb AI output at scale. That gain comes from a company that already knew how to build software before AI showed up.

Most enterprises don't work that way. A typical project involves dozens of people across multiple teams, governance layers, review processes and handoff points. AI tools accelerate the coding part, but coding was never the bottleneck.

CIBC, one of Canada's largest banks, has been deliberate about this. Their approach to AI adoption starts with people, not tools: step-by-step rollout, AI literacy programs, governance guardrails.

That's not timidity. That's a regulated institution understanding that in financial services, moving fast and breaking things is not an option. KPMG Canada's 2025 GenAI survey of Canadian banks and insurers confirmed the same pattern: success depends on clean data, governance and genuine workflow redesign, not just tool deployment.

The DORA researchers put it well: improving the development process does not automatically improve software delivery. Not without the fundamentals: small batch sizes, solid testing, clear ownership.

What actually works

The last time we went through this, with the IT revolution of the 1980s, it took roughly a decade for productivity gains to show up. They eventually did, once organizations restructured around the technology instead of bolting it onto existing processes. Productivity growth surged from 1995 to 2005 after years of stagnation.

The same pattern is emerging now. The organizations getting real results aren't the ones with the best models or the most tool licenses. They're the ones rethinking how work gets done.

Small, empowered teams with domain expertise. Clear separation between planning and execution. Architectural constraints that agents can't violate. Think Navy SEALs, not battalions.

In my previous post on harness engineering, I described the patterns: agent-readable documentation, enforced architectural guardrails, automated cleanup, humans specifying intent while agents execute.

Those patterns work because they address the real constraint. The tool isn't the problem. The environment the tool operates in is.

McKinsey's data is unequivocal on this point: the single strongest predictor of enterprise-level AI impact is whether an organization fundamentally redesigned its workflows when deploying AI. Not the model. Not the budget. Workflow redesign.

If your teams aren't seeing gains from AI-assisted development, the diagnosis is probably one of three things: people using power tools without knowing the trade, experienced people refusing to pick up the tools, or an organization that hasn't restructured around what the tools make possible. Fix those, and the gains show up. Ignore them, and you'll keep watching Solow's paradox play out in your own backlog.

Join the discussion on LinkedIn.