The AI perception gap

A friend of mine, a seasoned engineer, recently told me he felt the frontier AI models have stopped improving.

“Honestly? It’s plateaued. A year ago it felt about the same as now. It can write some code, but it cannot do what I do.”

He wasn’t being dismissive. He uses these tools daily. He’d tried multiple models, multiple workflows, and yet, he felt the models were not getting that much smarter.

I hear this from smart, technical people constantly. Not people ignoring AI. People who use it every day and have formed a considered opinion. And their experience is genuine. I don’t think they’re wrong about what they’ve seen.

And yet, we keep hearing frontier models are getting exponentially smarter. Where is the disconnect?

Some people think otherwise

Dario Amodei, Anthropic’s CEO, said in May 2025 that AI could wipe out 50% of entry-level white-collar jobs within five years. Unemployment could spike to 10-20%.

“You can’t just step in front of the train and stop it.” — Dario Amodei, Axios interview, May 2025

There were 55,000 AI-related layoffs in the US in 2025. Andrej Karpathy, the OpenAI cofounder, said he hadn’t written a line of code himself in months, directed AI agents for sixteen hours a day, and still felt like he was falling behind what was possible.

“I’m just like in the state of psychosis of trying to figure out what’s possible, trying to push it to the limit.” — Andrej Karpathy, No Priors podcast, March 21, 2026

The disconnect is widening, not shrinking

On the other hand, more people are feeling like they’re being forced to use AI at work and aren’t seeing the benefits. That frustration has data behind it.

An NBER working paper surveyed nearly 6,000 executives across the US, UK, Germany, and Australia. 69% of their businesses use AI. But 89% report zero productivity change over three years. 90% report no employment impact. Among top executives, two-thirds use AI for 1.5 hours a week at most.

“69% of firms actively use AI … nine-in-ten reporting no impact on employment or productivity.” — NBER Working Paper 34836: “Firm Data on AI”

An MIT report from August 2025 found that 95% of enterprise generative AI pilots fail to demonstrate P&L impact. Companies bolt AI onto existing processes, run a pilot, see no results, conclude the technology is overhyped.

“About 5% of AI pilot programs achieve rapid revenue acceleration; the vast majority stall, delivering little to no measurable impact on P&L.” — MIT NANDA, “State of AI in Business 2025”

The “nothing is happening” camp is the overwhelming majority. So where is all this panic coming from?

One job has clearly changed

Linus Torvalds, creator of Linux, famously a “do it by hand” craftsman who reviews kernel patches line by line, is now using an AI-powered IDE for personal projects. Karpathy hasn’t written a single line of code since December 2025.

“It’s like some powerful alien tool has been thrown into the world without an instruction manual. Everyone is groping for how to use it, and this magnitude 9 career earthquake has already shaken the entire industry.” — Andrej Karpathy, December 27, 2025

Peter Steinberger, creator of OpenClaw (360,000+ GitHub stars), came back from a three-year coding hiatus and made super-human level of code contributions using AI.

GitHub contributions tell the story:

Year	Steinberger	Nick (me)
2023	14	24
2024	47	435
2025	122,249	2,109
2026 (4 months)	428,849	1,621

Source: github.com/steipete, github.com/thisnick

Steinberger went from 47 contributions in all of 2024 to 428,849 in the first four months of 2026. That works out to well over 3,000 a day. My own jump is smaller but follows the same shape: from 24 in 2023 to on-pace for ~4,800 this year.

What software devs are actually doing differently

What distinguishes the most productive engineers from the rest is how much autonomy they hand the AI. Instead of using it to answer questions, summarize data, or just generate code they paste in, these engineers let the agent use their computer directly. It runs programs, executes tests, debugs its own failures, and deploys the result. They give it near-full access, with minimal supervision.

The difference isn’t copilot vs. better copilot. It’s copilot vs. autonomous agent with a clear objective.

	Copilot mode	Agent mode
Human role	Write prompt, review output, paste in	Define objective, review result
AI role	Generate a single response	Write, run, test, debug, iterate
Failure handling	Human monitors and catches issues	AI has the tools to monitor and resolve problems
Task duration	Seconds	Minutes to hours
Iteration	Human-in-the-loop every step	AI loops until objective met

AI wants to be an employee, not a smartass

The advancement of AI progress can be summarized in two charts:

Intelligence (MMLU benchmark score, %)

Line chart of MMLU benchmark scores from 2022 to 2026 showing frontier model scores climbing from 72% to 93% and flattening into a saturation zone above 90%

Autonomy (METR time horizon: length of task a model can complete unsupervised at 50% success)

Scatter chart of 25 frontier AI models from GPT-2 (2019) through GPT-5.4 and Claude Opus 4.6 (2026), plotting release date against log-scale task length, with a dashed trend line showing a ~125-day doubling time since 2023

Chart built from METR Horizon v1.1 benchmark data. Benchmark data: Stanford AI Index 2025.

The benchmark line (intelligence) is flat, basically saturated. The time horizon line (autonomy) is exponential, doubling every 7 months for six years, and recently accelerating to every 4 months. The current frontier sits at about 10 hours of autonomous work at 50% reliability.

At the start of last year, a frontier model could autonomously complete a one-hour task. Today, it can handle ten. In the span of a year, it went from intern to mid-level employee.

The root cause of the gap

Karpathy posted a thread recently that actually diagnoses the gap. Two groups are talking past each other, and there’s a specific shape to the disagreement.

First, a lot of people are still judging AI by the free tier they poked at a year ago.

“A lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much… these free and old/deprecated models don’t reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.” — Andrej Karpathy, April 17, 2026

But even paying $200/month isn’t enough. The frontier is uneven.

“A lot of the capabilities are relatively ‘peaky’ in highly technical areas. Typical queries around search, writing, advice, etc. are not the domain that has made the most noticeable and dramatic strides in capability.” — Andrej Karpathy

To really feel the exponential progress, you have to push these models to their limits, and today that limit lives in coding and mathematics. But that doesn’t mean the rest of us are left behind.

Coding agents are not only for coding

Karpathy is right about where the frontier sits today. But I don’t think that means non-technical work is stuck on the sidelines.

At Workflowly, we automate paperwork for customers. Our first attempt was the obvious one: point an agent at their systems and let it do the work. Results were OK, but we hit a wall fast. The operational knowledge (how this customer handles this kind of case, what the exceptions look like, local policy, office conventions) was too much to cram into a model’s context. We also hit what Cognition AI calls “context anxiety”: when a model senses it’s running out of context, it cuts corners and wraps up prematurely. Quality drops off a cliff.

So we reframed the problem. Instead of asking AI to do the paperwork, we asked AI to write code that does the paperwork.

We turned the operational knowledge into APIs with searchable docs. We put case-specific knowledge into files the agent could grep, read, and process like a junior engineer exploring a codebase. The paperwork problem became a coding problem. And coding is exactly what frontier agents have gotten shockingly good at.

Agent performance jumped immediately. The same model that was drowning in context before could now repeatedly solve cases by writing code to hit the right APIs and search through case files for precedent.

What worked for us generalizes. Most business processes can be reframed as a coding problem if you squint at them the right way. Doing your taxes, automating expense reports, pulling up client history, reconciling a messy spreadsheet: any of these becomes tractable the moment you treat it as software to be written rather than work to be done. You don’t have to do the work yourself, and you don’t have to hire an engineer to do it either. Let AI write the software for you.

If you want to get the best out of today’s frontier models, start thinking of yourself as a builder. You might still be a marketer, a doctor, or an accountant. But alongside that job, you also ship small tools that make your work easier. With AI behind you, that’s a superpower anyone can have now.