DiffSight

How Do You Review Code in the AI Era?

AI now ships code faster than any human can read it. So who actually reviews it? And how?

How Do You Review Code in the AI Era?

In 2025, the debate was still alive: can AI actually ship code, from small apps to enterprise scale? In 2026, the debate is closed.

The latest models from Anthropic and OpenAI, paired with the leap in agents and harnessing, settled it. AI will write our code.

Model capability is no longer the bottleneck. What's left is how fast large enterprises accept this reality, and how AI providers will meet their demands on privacy and security.

So where does that leave us? AI writes our code, but more importantly, the way we use it has shifted.

Less than a year ago, most developers were running AI in Copilot mode: autocomplete on steroids, suggestions you accept or reject one keystroke at a time. Today, it owns entire tickets. It plans, it codes, it runs its own introspection, it listens to CI feedback, it iterates. It does the job of a developer, or rather, of hundreds of developers, scalable on demand.

See the problem?

Thousands of PRs, opened by AI, every day, across every org and every public repo. Code is now near-unlimited, at near-zero marginal cost (thanks to Anthropic and OpenAI burning billions to keep Claude Code and Codex subscriptions cheap).

If you still don't see it, OSS already does. Maintainers are drowning. Some projects have started adopting explicit "no AI" contribution policies just to stay afloat, not because they're anti-AI or because the AI doesn’t produce correct code, but because they can't review the firehose of low-effort PRs landing in their inbox.

Because at the end of the chain, someone has to read this code. Approve it — or approve the choice not to read it. Understand it. Keep it in line with the architecture. Make sure it solves a real product problem.

And that someone is human.

A human can't be parallelized into Kubernetes pods. A human can't spawn sub-agents in their sleep.

Which leaves us with the question this whole article is about:


How do you review code in the AI era?

The vision pushed by industry leaders (CodeRabbit, Cursor, Anthropic, GitHub) is that code review itself will be simplified and automated by AI.

Long before AI, we already had a stack of deterministic tools doing exactly that. SAST scanners, dependency scanners, Semgrep, E2E test suites. All of them hunting for vulnerabilities and regressions. On top of that sat the human reviewer: the last line of defense against bugs, architectural drift, and backdoors.

The pitch is simple. A properly prompted agent can replace that last barrier, and unblock what is, today, the single biggest bottleneck in the SDLC.

Cursor isn't hiding it. A few weeks ago, they acquired Graphite. An explicit bet that code review is the next frontier to conquer.

The result?

Code Review Greptile

PR threads where AI reviews AI, on diffs of thousands of lines, in codebases of hundreds of thousands. Beautiful, confident comments. Approvals stamped in seconds.

AI Fatigue

See the problem?

Code review, if you've ever actually done it, was never really about hunting security flaws, regressions, or coding style violations. It was about understanding the change. About what shipping this code does to the system you own.

That back-and-forth in the PR was how ownership got transmitted. A form of cognitive handoff between engineers. The reviewer wasn't just gatekeeping. They were absorbing the change into their own mental model of the codebase.

The notion of cognitive debt in AI isn't new. Developers are reporting AI fatigue, and the research is starting to back it up. The question worth asking is whether these new AI review tools are actually solving the problem, or just a comfortable bandage. A virtual peer good enough to reassure the developer, sign the approval, and move on.

Because at the end of it, ownership of the code cannot be transmitted without someone actually reading the code.

AI Limit

And even if we set ownership and cognitive fatigue aside, there's a harder truth.

AI is not infallible. AI is a very sophisticated system that answers a prompt. But who writes the prompt? A human. And humans are fallible. We know this.

Even with a perfect AI, a perfect implementation of a prompt, the prompt itself is a non-deterministic spec language. "Build me a clean, scalable architecture." "Add a feature flag for this." "Refactor this module." Hand that to an autonomous system and you will, by construction, ship bad implementations. Not sometimes. Necessarily.

And given all this, we don't even need to argue from first principles. The evidence is already in.

Despite the new generation of models, we've never seen this many critical bugs ship in major software. GitHub. Cloudflare. AWS. The postmortems keep pointing in the same direction: AI-generated code that no one really read/understand.

Code ownership is essential. AI review tools are useful, the same way SAST scanners and E2E tests are useful. They flag things. They catch a class of issues. They have their place in the stack.

But the review itself stays human. And that's a prediction that will still hold in twenty years.

—————————————————————————————————-

This is exactly why we built DiffSight.

Not as an AI replacement for engineers.

But as a code review client designed around the human reviewer.

Built on a simple principle:

ownership of code requires that someone actually reads it.

Our job is to make that process bearable again.

What DiffSight Does

DiffSight uses AI where AI genuinely helps:

The goal is not to remove the reviewer.

The goal is to augment their cognition.

DiffSight is in closed beta. Join the waitlist at diffsight.dev.