Per-Agent Pod Architecture Plan

Issue: TEC-62 Goal: Eliminate single-pod process-sharing for claude_local agents, replace with K8s Jobs per run.

Current State (already done)

Item	Status
Fork `paperclipai/paperclip` → `craigedmunds/paperclip`	✅
`integration` branch created and active	✅
`upstream` remote set to `paperclipai/paperclip`	✅
`opencode_remote` adapter added to integration branch	✅
`images.yaml` enrollment updated to `craigedmunds/paperclip@integration`	✅

Remaining Work

1. Regenerate image-factory CDK8s manifests

The generated dist/cdk8s/image-factory.k8s.yaml still references paperclipai/paperclip@master. Run CDK8s synthesis to pick up the images.yaml changes and point the Kargo Warehouse at the correct repo/branch.

Steps:

cd repos/image-factory/cdk8s && python main.py (or equivalent task)
Commit the updated dist/cdk8s/image-factory.k8s.yaml to image-factory-state
Push → ArgoCD syncs → Kargo Warehouse watches craigedmunds/paperclip@integration

Files: repos/image-factory-state/dist/cdk8s/image-factory.k8s.yaml

2. `claude_k8s` adapter

New adapter package: packages/adapters/claude-k8s/

Interface: Implements ServerAdapterModule (same as claude_local):

execute(ctx: AdapterExecutionContext): Promise<AdapterExecutionResult>
testEnvironment(ctx): Promise<AdapterEnvironmentTestResult>

Execution flow (replaces runChildProcess with K8s Job):

execute() called
  → Build Job spec:
      name: paperclip-run-{runId}
      namespace: <config.namespace>
      serviceAccountName: <config.serviceAccount>
      image: <config.image>  # same claude image as claude_local
      command: ["claude", "--print", "-", "--output-format", "stream-json", ...]
      env: PAPERCLIP_* vars + ANTHROPIC_API_KEY (from secret ref)
      resources: config.resources (cpu/memory limits per agent)
      volumeMounts:
        - name: workspace
          mountPath: /workspace
          subPath: workspaces/<agentId>   # isolation per agent
  → Create Job via @kubernetes/client-node BatchV1Api
  → Stream logs from pod stdout/stderr via CoreV1Api log streaming
  → Parse streaming JSON (reuse claude-local parse.ts)
  → Delete Job on completion
  → Return AdapterExecutionResult

Adapter config fields:

namespace: paperclip-agents       # K8s namespace for Jobs
image: ghcr.io/craigedmunds/paperclip:latest
serviceAccount: paperclip-agent-runner
resources:
  requests: { cpu: "500m", memory: "2Gi" }
  limits: { cpu: "2", memory: "4Gi" }
pvcName: paperclip-agent-workspace  # shared RWX PVC
graceSec: 30

Workspace isolation:

Shared PVC mounted at /workspace with subPath: workspaces/<agentId>/
No git credentials in pods — all git ops go through Paperclip API (push/pull handled by server)
Session state persisted to PVC subpath across runs

Registration:

Add to server/src/adapters/registry.ts alongside claude_local

3. K8s infrastructure for agent pods

New manifests in repos/k8s-lab/ or as part of the Paperclip app:

Namespace: paperclip-agents
ServiceAccount: paperclip-agent-runner with minimal RBAC (read secrets, write to workspace PVC)
PVC: paperclip-agent-workspace — RWX (shared across Jobs), large enough (e.g. 50Gi)
Secret: claude-api-key — ANTHROPIC_API_KEY for agent pods
NetworkPolicy: agents can only reach Paperclip API + external APIs, not internal cluster services

4. Image factory `.builders/` per adapter

In the integration branch of craigedmunds/paperclip, add builder directories:

.builders/claude-k8s/ — Dockerfile for the claude_k8s image (claude CLI + Node runtime)
.builders/opencode-remote/ — if a dedicated image is needed

These feed into the image-factory pipeline (CDK8s dockerfile config per builder).

Implementation Order

CDK8s regen (quick, unblocks Kargo pipeline) → PR to image-factory-state
claude_k8s adapter (core feature) → PR to craigedmunds/paperclip@integration
K8s infra manifests (namespace, SA, PVC, RBAC) → PR to k8s-lab
Image factory builder dirs → PR to craigedmunds/paperclip@integration
Wire it up — update Paperclip agent config in cluster to use claude_k8s

Notes

The integration branch branch name is confirmed (board requested release or integration, not custom-adapters)
Merge discipline: upstream/master → origin/master → origin/integration (cherry-pick upstream releases)
Per-run Job naming uses runId to avoid conflicts; TTL or manual delete on completion
onSpawn callback in execute context can be used to report the Job/pod name instead of a local PID

Techcle Wiki

Explorer

Plan

Per-Agent Pod Architecture Plan

Current State (already done)

Remaining Work

1. Regenerate image-factory CDK8s manifests

2. `claude_k8s` adapter

3. K8s infrastructure for agent pods

4. Image factory `.builders/` per adapter

Implementation Order

Notes

Graph View

Table of Contents

Backlinks

Techcle Wiki

Explorer

Plan

Per-Agent Pod Architecture Plan

Current State (already done)

Remaining Work

1. Regenerate image-factory CDK8s manifests

2. claude_k8s adapter

3. K8s infrastructure for agent pods

4. Image factory .builders/ per adapter

Implementation Order

Notes

Graph View

Table of Contents

Backlinks

2. `claude_k8s` adapter

4. Image factory `.builders/` per adapter