Article 010: Outline (Distilled from Draft)
Title: What long-running agents change, and what they don’t Key Message: Persistence matters more than intelligence. Agents extend work across time and expand what’s economically viable, not just improve capability. Series: B — Reframing the work (Part 5 of 6) Status: Drafted, publishes Monday 2026-03-23
Core thesis: Long-running agents change WHEN work happens (temporal) and HOW MUCH work becomes feasible (economic). Not about what agents can do, but when work happens and at what cost.
Structure
Opening: The wrong question
What most people ask:
- What can the agent do? (capability focus)
- Can it reason? Write production code? Handle full features?
- These are reasonable but not the most important questions
The real shift:
- Temporal and economic, not capability
- Long-running agents change when work happens AND how much work becomes feasible
- Not how well work is done, but when it happens and at what cost
- Distribution across time + volume at reasonable cost
The capability framing (why it misses the point)
The natural tendency:
- Evaluate agents by what they can replace
- Capability as threshold (above = no humans needed)
- Substitution model: capability rises, human involvement falls
What this misses:
- Agents don’t replace work, they change when work is done
- Not a substitute for a person
- A way to extend judgment across time
The persistence shift (core insight)
Example: Eight hours overnight
- You approve plan, lock tests, document constraints
- Agent works through night, commits at stages, surfaces blockers
- You wake to Git log, completed tasks, open questions
What changed:
- Not quality of your judgment
- Decisions still required your expertise
- But decisions made in afternoon, not 2am
- Agent moved exercise of judgment to different point in day
The real change:
- Work happens in hours that previously couldn’t happen (you were asleep)
- Agent = way of front-loading human judgment for async application
- Not capability improvement, but temporal extension
The economic dimension (how much becomes viable)
The volume question:
- Not just “work happens overnight” (temporal)
- But “how much work becomes economically feasible?” (volume)
Without agents:
- 8 hours overnight = expensive night shift OR unsustainable personal hours OR work doesn’t happen
- Continuous operation = prohibitive human cost
- Parallel streams = need multiple people
With agents:
- 8 hours overnight = marginal compute cost
- Continuous operation = affordable
- Parallel streams = economically viable
What this enables:
- Not just redistributed work (same total, different times)
- MORE total work becomes feasible
- Work that wouldn’t happen at all (too expensive with human cost)
- Volume of work expands, not just timing
The shift:
- Human cost: linear with time (pay for hours)
- Compute cost: marginal (doesn’t scale same way)
- Work that was economically unviable becomes viable
- Not “move work to night,” but “do work that wouldn’t happen”
Examples:
- Running comprehensive test suites continuously (not just on commit)
- Exploring multiple architectural approaches in parallel
- Implementing fallback options “just in case”
- Maintaining documentation that stays in sync with code
- Work with uncertain ROI becomes affordable to attempt
What extends across time
Key question shifts:
- From “what can the agent do?”
- To “what decisions need to be made before the agent starts?”
Async work depends on clarity at handoff:
- Spec precise enough (ambiguities don’t block)
- Tests capture what “done” means
- Constraints explicit, not tacit
Async surfaces cost of vagueness:
- Synchronous: hit ambiguity, ask in 30 seconds (low cost)
- Async: agent makes wrong assumption OR surfaces blocker (high cost)
What extends:
- Not just execution capacity
- Quality of artifacts (spec, plan, tests, constraints)
- Decisions at 4pm shape what happens at 2am
The skill shift:
- Less about moment of execution
- More about preparation that enables execution without you
- Less doing, more enabling
What doesn’t change
The temptation:
- If agent does more work, is human judgment less important?
The reality:
- Importance of judgment unchanged
- What changes: points at which it applies
- Not present for implementation, but present for decisions that shaped it
When things go wrong:
- Questions are still human questions
- Did spec capture right requirements?
- Did tests test right things?
- Was architecture sound?
Understanding still matters:
- Perhaps more so
- Need to evaluate overnight results
- Understand agent’s choices
- Recognize when something looks right but isn’t
- Can’t outsource understanding, only change when you apply it
The time structure of work
Synchronous development:
- Continuous: sit down, code, stop
- Bounded by presence
- Work starts/stops with you
With long-running agents:
- Handoff pattern, not continuous flow
- Setup phase: thinking, decisions, artifacts
- Execution phase: agent works, you do other things
- Review phase: return, assess, decide next
This doesn’t mean less work:
- Work differently distributed
- Setup requires careful thinking (vagueness propagates)
- Review requires genuine engagement (not rubber-stamping)
- Execution is clock-hours, but human effort relocated
Natural fit for:
- Batch processing
- Multi-step implementation
- Long test runs
- Research compilation
Poor fit for:
- Fast iteration
- Exploratory coding
- Debugging (needs immediacy)
The judgment:
- Which kind of task am I doing?
- Which mode fits it?
- Judgment remains human
Close: The shape of change
What changed (not what you thought):
- Early framing: capable agents replace human tasks
- Actual shift: agents change when work happens and how much work becomes economically viable
- Temporal: extend reach of human decisions into periods when human isn’t present
- Economic: work at compute cost vs human cost dramatically changes what’s feasible
The dual shift:
- Calendar of work changes (temporal)
- Volume of work expands (economic)
- Not replacement, but extension across time + expansion of what’s affordable
What didn’t change:
- Need for human understanding
- Sound judgment at decision points
- Careful preparation of guiding artifacts
- Judgment isn’t automated, it’s relocated and amplified
That relocation and amplification matters:
- Full-day work → morning prep + overnight execution + hour review (temporal)
- Work that wouldn’t happen → becomes viable at compute cost (economic)
- Not less work, differently distributed AND more total work becomes affordable
- Not less judgment, judgment at different moments + judgment about what work to attempt
The real questions:
- Not “what can the agent do?”
- “What decisions need to be made before agent starts, and how do I make them well?”
- “What work becomes worth attempting at compute cost that wasn’t at human cost?”
- Agents extend time over which work can happen + expand volume of work that’s economically viable
- Thinking that makes both extensions valuable remains ours
Threads from earlier articles
| From | Theme | Connection |
|---|---|---|
| 002 | Work continues without you | Extends to “work continues because agent persists” |
| 005 | Continuity over speed | Persistence enables continuity across time |
| 008 | Verified progression | Setup phase = creating artifacts that guide execution |
| 009 | Git as memory | Artifacts in Git enable agent to work without you |
Key contrasts emphasized
- Capability vs temporal + economic shift
- Replace vs extend (time) + expand (volume)
- Substitution vs relocation + amplification
- Continuous vs handoff pattern
- Doing vs enabling
- Human cost vs compute cost
- Redistributed work vs more total work
- What can agent do vs what work becomes worth attempting