“I’m not really coding anymore.” I heard this from a developer last week, and it landed with an uncomfortable weight. Most of their day is reviewing agent output. Evaluating pull requests. Directing architectural changes. Approving refactorings they didn’t write. The hands-on keyboard time has shrunk to maybe 20% of what it used to be. The same is true for me. Most of my work is evaluating system designs, directing technical decisions, reviewing approaches. I don’t write much code anymore either. Does that mean we’re not creators?

The identity of “developer” has always been tied to creation. Typing code. Building things. Making systems work with your hands. There’s a particular satisfaction in writing a clever algorithm, debugging a complex problem, watching your code run for the first time. That’s the work. That’s what makes you a developer. Except now, increasingly, that work happens elsewhere. Delegated to agents. Automated by processes that run overnight. The code still gets written. The systems still get built. But we’re not the ones typing most of it. So what are we?

The shift is from operator to supervisor, and here’s what that actually looks like in practice. Morning: I review overnight agent work—a refactoring that touched 70 files, documentation updates across three modules, test coverage improvements. I didn’t write any of it, but I need to evaluate all of it. I set direction for the next set of changes. The agent proposes an architectural approach, and I evaluate whether it makes sense, whether it fits the system, whether it will cause problems six months from now. I approve some parts, reject others, explain why. I correct mistakes not by rewriting the code myself, but by directing the next iteration: “This abstraction is too complex,” “These tests are brittle,” “This pattern doesn’t match the rest of the codebase.” The agent executes; I evaluate and redirect.

I decide what gets merged and what gets reworked. I still write code, but it’s less creation from scratch and more intervention and refinement—fixing the edge case the agent missed, adjusting the pattern to fit the broader architecture, writing the tricky bit that requires deeper context. The ratio has flipped. It used to be 80% typing, 20% reviewing; now it’s inverted. Most of my time is spent evaluating, directing, deciding—not typing. This feels like a loss, like I’ve been demoted from maker to manager, but that framing misses something important: supervision is technical work.

Consider what it takes to review that 70-file refactoring. I need to evaluate whether the pattern fits the existing codebase conventions—not just “does it work” but “is this how we do things here,” which requires understanding the architectural principles that aren’t written down anywhere, the implicit patterns that make the codebase coherent. I need to verify the refactoring preserves the original intent across contexts: the agent moved code around, extracted functions, renamed things, but did it maintain the subtle distinctions that mattered, or did it accidentally merge logic that should stay separate? I need to understand what the code was trying to do, not just what it does. I need to check if the abstraction level is appropriate for each module—too abstract and it becomes impossible to understand, too concrete and it becomes impossible to maintain, and the right level depends on context, on how the module is used, on what’s likely to change. That’s judgment, not mechanics. I need to identify edge cases the agent missed: the null check that’s missing, the race condition that only happens under load, the assumption that breaks when the input format changes. Spotting these requires experience with what goes wrong, not just what should go right. Then I approve 90% and reject 10%, and I need to explain why—not just “this is wrong” but “this is wrong because it will cause this specific problem in this specific context.” The explanation is part of the work; it’s how the next iteration gets better.

This isn’t management. It’s deep technical judgment. I need to understand the codebase, architectural principles, type systems, dependency patterns—the skill requirement is the same as writing the code myself, maybe higher, because I need to spot what’s wrong without having written it.

Or take architectural evaluation. An agent creates a new abstraction layer, and I need to decide: does this simplify the system, or just add indirection? The agent can generate the code, can make the abstraction technically correct, but it can’t evaluate whether the abstraction is right—that requires understanding tradeoffs that only become visible with experience. Will this pattern make sense to someone reading it in six months? Not “is it documented” but “is it the kind of abstraction that makes sense in this system”—some abstractions clarify, others obscure, and the difference isn’t in the code itself, it’s in how it fits the mental model of the system. What are the performance implications? Not just “is it fast enough now” but “what happens when this scales”—the abstraction might be elegant but introduce latency we can’t afford, or it might seem heavyweight but actually improve performance by enabling better caching. You can’t know without understanding the system’s bottlenecks. Does this fit with the rest of the architecture? We have patterns, some explicit, most implicit, and adding a new abstraction that doesn’t fit those patterns creates friction—not immediately, but over time, as the codebase becomes harder to understand because it’s inconsistent. Maintaining consistency requires seeing the whole system, not just the local change.

What breaks if we accept this? What breaks if we reject it? Both choices have consequences. Accepting it means committing to maintaining this abstraction; rejecting it means the agent needs to find another approach, which takes time and might not be better. The decision isn’t “right or wrong,” it’s “which tradeoffs are we willing to accept.” The agent can generate code; it can’t (yet) evaluate whether the architecture is right. That requires experience, judgment, understanding of tradeoffs—supervisory work, but deeply technical.

Then there’s the value of “no.” An agent generates comprehensive test coverage—200 new tests, 95% coverage, looks great on paper—but when I review it, I reject 30% of the tests. These tests are brittle: they test implementation details, not behavior, which means if we refactor the code, these tests break even though the behavior hasn’t changed—that’s not useful, that’s maintenance burden disguised as quality. These tests are redundant: they test the same thing five different ways, and more tests doesn’t mean better coverage, it means slower CI and more noise when something actually breaks. These tests miss the actual edge cases: the agent generated tests for the happy path and obvious error cases, but the real bugs are in the subtle interactions—the case where two valid inputs combine in an invalid way, the race condition that only happens under specific timing. Those tests aren’t here. Knowing what not to do requires as much skill as knowing what to do, maybe more, because it’s easy to add code but hard to recognize when code shouldn’t be added. That recognition comes from experience with what causes problems, not just what seems like a good idea. The skill is different, not less.

So does this mean we’ve been demoted? That the work has become less technical, less creative, less valuable? I don’t think so. The skill requirement hasn’t decreased—it’s shifted. Evaluation requires deeper understanding than creation in some ways. When you write code yourself, you’re making local decisions one at a time. You see the problem, you implement a solution, you move on. When you’re evaluating someone else’s code—or an agent’s code—you need to reconstruct their reasoning, understand why they made the choices they did, and determine whether those choices fit the broader system. You need to spot what’s wrong without having written it yourself. That’s harder, not easier.

Experience compounds differently in supervisory work. When you’re typing code, experience helps you write better implementations faster. When you’re evaluating code, experience helps you recognize patterns—both good and bad—across a much larger surface area. You’re not just evaluating this one refactoring. You’re evaluating whether this refactoring follows the patterns that make the rest of the system coherent. Whether it introduces technical debt we’ll regret in six months. Whether it simplifies or complicates future changes. That pattern recognition doesn’t come from typing more. It comes from having seen what works and what doesn’t, at scale, over time.

Agency remains, just expressed differently. You still decide what happens. The agent proposes, but you direct. You set the architectural vision. You determine what’s good enough and what needs rework. You decide which shortcuts are acceptable and which will cause problems. The agents execute the work, but you’re still the one making the calls. That’s not diminished agency. That’s leverage.

The craft is evolving, not disappearing. Orchestration is a skill. Knowing what to delegate matters. Not everything should be delegated. Critical bugs that require deep system understanding? You handle those yourself. Architectural decisions that set direction for months? You make those. But routine refactoring? Test coverage improvements? Documentation updates? Those can run overnight while you sleep. The skill is knowing which is which—and that distinction requires judgment, not just speed.

This progression has always existed in software development. Senior developers move toward supervision naturally. You review pull requests. You guide architectural decisions. You mentor junior developers. The supervisory role isn’t new. What’s new is how quickly you reach it, and the scale at which it operates. With human developers, you might supervise a team of five or ten. With agents, you’re effectively supervising output that would require a team ten times that size. The agent executes at scale; you evaluate constantly. The bottleneck isn’t implementation anymore—it’s your attention. How much can you meaningfully review? How many architectural decisions can you make in a day? How many iterations can you direct before your judgment degrades?

The developer role is reframing from creation to supervision, and that feels uncomfortable — we’re trained to value hands-on keyboard time, to equate typing with creation, to feel productive when we’re writing code. Reviewing doesn’t feel the same: it’s less tactile, less immediate, less satisfying in the moment. But supervision has always been part of the progression: senior developers supervise, they review code, they guide architecture, they decide what’s good enough. What’s different now is the speed and scale—you reach that supervisory role faster, and you’re supervising output that would take a team of ten people to produce at human speed. The agent generates; you evaluate. The agent proposes; you direct.

I still write code — more than many in supervisory roles, but less than I used to. Increasingly, my value isn’t in the typing; it’s in the judgment: knowing what should be built, evaluating what was built, deciding what’s good enough, directing the next iteration, recognizing when the abstraction is wrong, when the test is brittle, when the refactoring introduces more complexity than it removes. The craft hasn’t diminished—it’s evolving, from keystrokes to supervision, same agency, different expression. The question isn’t whether this is real work—it is. The question is: how much supervision can one person effectively do? And that’s the constraint we’re learning to navigate.