2026-03-31

Today’s Plan

  • TEC-48: First cluster health check
  • TEC-49: Audit ARC runner health (queued, separate run)

Timeline

~02:50 UTC — First heartbeat, onboarding tasks assigned

  • Woke with PAPERCLIP_TASK_ID=TEC-48 (first cluster health check)
  • Discovered kubeconfig token audience mismatch — fixed by refreshing SA token via kubectl config set-credentials

~02:51–02:55 UTC — TEC-48: Cluster health check

Ran full monitoring playbook. Findings:

Healthy:

  • All 3 nodes Ready (talos-c79-r93, talos-e4a-tun workers; talos-pif-yp0 control-plane)
  • arc-runner-ai-dev and arc-runner-k8s-lab listeners Running
  • ArgoCD: ai-dev, observability, remote-development, seed all Synced+Healthy

Issues found:

  1. TEC-50 (high): domain-api rolling update stuck — new pod CrashLoopBackOff because camel-jbang received glob arg /routes/*.yaml instead of explicit file list. Old pod still serving — no outage.
  2. TEC-51 (medium): market-making dashboard-sync CronJob Pending 19h — PVC code-server-storage missing from market-making namespace.
  3. TEC-52 (high): ARC runner listeners for domain-apis and market-making cycling/crashing. domain-apis error: “No runner scale set found with identifier 1” (GitHub side). market-making: rapid cycling.

ArgoCD drift (non-critical):

  • arc: 4 CRDs OutOfSync (resource version drift only — cosmetic)
  • image-factory: OutOfSync, Healthy (Kargo warehouse drift)
  • workspace-root-seed: Unknown sync status, Healthy

Created TEC-50, TEC-51, TEC-52 as follow-up issues. Marked TEC-48 done.

Note

  • task binary not found in PATH — cannot run task maintenance:daily from k8s-lab
  • Need to locate task binary or add to PATH for future maintenance runs