Architecture Decision Document: OpenCode Slack Integration
This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together.
Document Overview
Project: OpenCode Slack Integration
Status: Architecture In Progress
Last Updated: 2026-01-31
Architect: Winston (with Craig)
⚠️ NOTE: For the core session pattern (builder workspaces, lifecycle, message routing) shared across all chat platforms, see:
This document covers Slack-specific architecture, deployment automation, and K8s integration.
Input Documents Loaded
This architecture is informed by:
- PRD (
prd.md) - Product requirements and user workflows - Discovery Questions (
discovery-questions.md) - Annotated requirements from user - Architecture Questions Round 2 (
architecture-questions-round2.md) - Technical clarifications - API Contract (
api-contract.md) - Slack ↔ Backend interface specification - Work Package: Backend (
work-package-backend.md) - Backend implementation scope - Work Package: Slack (
work-package-slack.md) - Slack connector implementation scope - Workspace Pattern (
workspace-pattern.md) - Multi-repo workspace approach (usingpar)
Project Context Analysis
Requirements Overview
Functional Requirements:
The OpenCode Slack Integration bridges asynchronous Slack-based interaction with OpenCode CLI sessions, supporting AI-assisted development across multiple parallel projects. The system encompasses:
- Slack Interface Layer (FR-1): Slash command workflow forms for task initiation with rich metadata (category, project, repos, priority)
- Intelligent Routing (FR-2): Two-dimensional routing decisions combining BMAD agent type (architect/PM/builder/party-mode) with AI model selection (Claude Code API vs Qwen 2.5 Coder 32B local)
- Workspace Orchestration (FR-3): Integration with
parworkspace manager to create isolated multi-repo builder environments - Session Bridge (FR-4): Bidirectional communication between Slack threads and OpenCode sessions, with dual-UI visibility (Slack + OpenCode web UI)
- Deployment Automation (FR-5, FR-6): ConfigMap/PVC-based deployment to K8s lab namespaces with automated test execution
- Async Question Handling (FR-7): Interactive Slack buttons with configurable timeouts and recommended defaults
- Progress Visibility (FR-8): Milestone-based updates with periodic summaries for long-running work
- Multi-Project Concurrency (FR-9): Isolated workspaces and sessions enabling parallel work across projects
- Session Durability (FR-10): Git-backed state persistence supporting work spanning days/weeks
Non-Functional Requirements:
- Performance (NFR-1): Sub-3-second Slack acknowledgment, <30-second deployment, 1-second question forwarding
- Reliability (NFR-2): Webhook retry logic, git-backed state (zero loss tolerance), graceful failure recovery
- Security (NFR-3): Slack signature verification, K8s RBAC least-privilege, namespace isolation, secret protection
- Scalability (NFR-4): MVP scoped for single user with <10 concurrent sessions
Scale & Complexity:
- Primary domain: Backend orchestration + Chat platform integration + K8s deployment automation
- Complexity level: Medium-High
- Real-time streaming (SSE for deployment progress)
- Multi-session state management
- Kubernetes namespace lifecycle orchestration
- Dual AI model routing and availability management
- Git-based distributed state synchronization
- Estimated architectural components: 8-10 major components
- Slack connector
- Backend API gateway
- Session manager
- OpenCode bridge
- BMAD router with model selector
- Workspace manager (integrates with
par) - Deployment orchestrator
- Test orchestrator
- K8s namespace manager
- Webhook event dispatcher
Technical Constraints & Dependencies
Critical Dependencies:
- OpenCode CLI: Session management, web UI integration, output parsing capabilities
- Par (Workspace Manager): Multi-repo workspace creation with worktree isolation
- Kubernetes Infrastructure: Namespace creation RBAC, ingress controller, cert-manager, PVC provisioning
- Slack Platform: Custom app installation, webhook endpoints, interactive components, workflow forms
- Git: State persistence, workspace branching, session resumability
Architectural Constraints:
- MVP Timeline: 1-2 week implementation window drives technology choices toward proven patterns
- Single User Scope: Simplified authentication, no multi-tenancy concerns for MVP
- No Image Builds: ConfigMap/PVC deployment strategy only (fast iteration over production-readiness)
- Session Persistence Requirement: State must survive gateway restarts, OpenCode crashes, multi-day inactivity
- Performance Target: <30 second deploy time influences manifest strategy and K8s interaction patterns
Technology Choices Implied by Requirements:
- Python/FastAPI (based on work packages, aligns with K8s client libraries)
- Server-Sent Events (SSE) for deployment progress streaming
- Git-backed YAML for session state
- Kustomize for K8s manifests (per infrastructure notes)
- Contract-first development (API contract enables parallel Slack/Backend work)
Cross-Cutting Concerns Identified
1. Session Lifecycle Management
- Creation: Workspace initialization via
task builder:init, OpenCode spawn, state file creation - Active state: Message forwarding, output parsing, question detection, progress tracking
- Persistence: Git commits for state checkpoints, resumability after crashes
- Cleanup: Workspace removal, namespace deletion (immediate vs deferred), branch cleanup
2. AI Model Selection & Availability
- Routing Heuristics: Task analysis determines agent type AND model recommendation
- Architecture/complex reasoning → Claude Code (API)
- Code generation/implementation → Qwen 2.5 Coder 32B (local)
- Availability Checking: Local model health checks, API quota validation
- Fallback Strategy: What happens when recommended model unavailable? (OPEN QUESTION)
- Cost/Performance Tradeoffs: API costs vs local compute, latency considerations
3. Concurrent Access Control
- Dual UI Problem: Sessions visible in both Slack and OpenCode web UI
- Ownership Semantics: Can Slack bot and human both send messages? (OPEN QUESTION)
- Conflict Resolution: Race conditions when both interfaces active simultaneously
- Visibility Strategy: Should web UI distinguish Slack-managed sessions? (OPEN QUESTION)
4. Git-Based State Synchronization
- State File Format:
.session-state.yamlschema and versioning - Commit Strategy: When to commit state (every milestone? every message? periodic?)
- Branching Strategy: Builder branches, question timeout branches (OPEN QUESTION)
- Merge/Cleanup: What happens on session completion? PR? Direct merge? (OPEN QUESTION)
5. Kubernetes Security Boundaries
- RBAC Design: Gateway service account permissions (namespace creation, manifest apply, ingress creation)
- Namespace Isolation: Lab namespaces must not access production resources
- Secret Management: Slack tokens, AI API keys, K8s credentials
- Network Policies: Ingress-only access to lab deployments
6. Webhook Reliability
- Retry Logic: Slack webhook failures must not lose events
- Event Ordering: Progress updates must arrive in sequence
- Idempotency: Repeated webhook deliveries must not duplicate actions
- Timeout Handling: Long-running operations must not block webhook responses
Architectural Decisions Required
Based on this analysis, the following architectural decisions must be made:
- Component Architecture: Monolith vs microservices split between Slack connector and backend orchestrator
- Model Selection Algorithm: Heuristics for agent type + model recommendation with confidence scoring
- Model Fallback Strategy: Behavior when Claude API unavailable or Qwen local model down
- Session Ownership Model: Exclusive Slack bot control vs shared access with web UI
- Question Timeout Git Strategy: Branching approach when work continues before user responds
- Deployment Promotion Workflow: PR creation vs direct merge vs GitOps sync
- Slack Threading Strategy: When to create new threads vs continue in existing thread
- State Commit Frequency: Balance between durability and git noise
- Namespace Lifecycle Policy: Immediate cleanup vs time-based retention vs manual
- OpenCode Bridge Implementation: PTY wrapping, tmux scripting, or session API integration
Architectural Decisions
Decision 1: Component Architecture
Decision: Modular Monolith with Clean Module Boundaries
Rationale:
- Single deployment artifact reduces operational complexity for MVP
- Clean module boundaries enable parallel development
- API contract provides future optionality to split into microservices
- MVP timeline (1-2 weeks) favors shipping over premature optimization
- Single user scope means no scaling pressure
Implementation:
- Single FastAPI application
- Modules organized by API contract boundaries:
connectors/slack/- Slack Socket Mode integrationcore/- Session management, BMAD routingdeployment/- K8s orchestrationintegrations/- OpenCode SDK bridge
- Contract tests validate internal boundaries
- Can extract to microservices post-MVP if needed
Decision 2: Model Selection Algorithm
Decision: Keyword-Based Heuristic with Learning Loop
Rationale:
- Simple keyword matching achieves 70-80% accuracy (acceptable for MVP)
- User maintains full control with override capability
- Logging all decisions builds dataset for future ML model
- Fast to implement, easy to understand and debug
Algorithm:
def route_task(task_title, task_description):
# Keyword analysis
architecture_keywords = ["design", "architecture", "security", "scale"]
business_keywords = ["feature", "user", "workflow", "product"]
implementation_keywords = ["implement", "fix", "refactor", "optimize"]
# Scoring with complexity boost
complexity = estimate_complexity(task_description)
scores = calculate_scores(keywords, complexity)
# Agent selection
agent_type = max(scores, key=scores.get)
# Model selection based on agent
if agent_type in ["architect", "pm"]:
model = "claude-code" # Complex reasoning
else:
model = "qwen-coder" # Code generation
return RoutingDecision(
agent=agent_type,
model=model,
confidence=calculate_confidence(scores),
reasoning=generate_reasoning()
)Learning Loop:
- Log every routing decision + user override
- Build training dataset for future ML-based routing
Decision 3: Model Fallback Strategy
Decision: Fail Fast with Opportunistic Recovery
Strategy:
- Health check fails → Post error to Slack with [Retry] [Switch to Alternative] buttons
- Background polling checks model health every 30-60 seconds
- If model recovers before user responds → Update Slack message, proceed with original
- If user responds first → Honor user choice, cancel polling
- Polling timeout: 2 minutes maximum
Implementation:
async def handle_model_unavailable(session_id, recommended_model):
message_ts = await slack.post_message(
text=f"⚠️ {recommended_model} currently unavailable.",
buttons=["Retry", f"Switch to {fallback_model}"]
)
# Race: user response vs model recovery
result = await race(
poll_model_availability(recommended_model, max_duration=120),
wait_for_user_response(message_ts)
)
return result.choiceBenefits:
- User never blocked
- System attempts self-healing
- Clear communication about fallback options
Decision 4: Session Ownership Model
Decision: Shared Session with Global Presence-Aware Notification Routing
Architecture:
- OpenCode session is source of truth
- Both Slack and Web UI can send messages to same session
- Global notification preference (not per-session):
- Explicit preference (highest priority):
/focus slackor/focus web - Automatic presence detection: Web activity → Web notifications, Idle → Slack
- Default: Slack (mobile-first)
- Explicit preference (highest priority):
Notification Routing:
class GlobalNotificationRouter:
def get_target(self) -> str:
if self.user_preference: # Explicit
return self.user_preference
if self.last_web_activity and (now() - self.last_web_activity) < 10min:
return "web" # Auto-detected presence
return "slack" # DefaultEscalation Rules:
- Critical errors → Both channels (override preference)
- Urgent timeouts → Both channels
- Major milestones → Respect preference
- Questions → Respect preference
Decision 5: Question Timeout Git Strategy
Decision: Confidence-Based Speculative Branching on Sub-Branches
Strategy:
- Never block work
- Create speculative sub-branch from feature branch
- Continue work at confidence-appropriate pace
- Merge or discard based on user response
Branching Pattern:
feature/jwt-auth (base feature branch)
├── tag: question-Q123-asked
└── feature/jwt-auth-Q123-rs256 (speculative)
└── (work continues here)
Question Classification:
- Deferrable (Boolean NFRs): Default “No”, continue without feature
- Blocking (Multi-choice, Functional Requirements): Must get answer
Confidence-Based Pace:
- High (>80%): Full speed on speculative branch
- Medium (50-80%): Finish current task, pause
- Low (<50%): Minimal work, escalate quickly
On User Response:
- Matches recommendation → Squash merge to feature branch
- Different answer → Reset to tag, create new branch with correct choice
Decision 6: Deployment Promotion Workflow
Decision: PR Creation with Manual Merge in GitHub Web UI
Workflow:
- User clicks “Approve & Promote” in Slack
- System pushes feature branch(es) to origin (multi-repo)
- System creates PR(s) via GitHub API
- PR description includes:
- Test results summary
- Lab deployment URL(s)
- Agent decisions and reasoning
- User merges PR manually in GitHub web UI
- PR CI tests must pass (standard GitHub workflow)
- Builder namespace stays alive until manual cleanup
Multi-Repo Support:
📝 **Pull Requests Ready for Review**
**PRs Created:**
1️⃣ **domain-apis** - PR #123 (✅ Tests: 45/45 passing)
2️⃣ **ai-dev** - PR #87 (✅ Tests: 12/12 passing)
**Test Deployments:**
🌐 api-service: https://api-service.builder.lab.ctoaas.co
🌐 gateway: https://gateway.builder.lab.ctoaas.co
💻 Code Server: https://code.example.com/workspace/...Decision 7: Slack Threading Strategy
Decision: Single Thread Per Session with Rich Multi-Repo PR Notifications
Pattern:
- One Slack thread per work session/task
- All progress, questions, and PR updates in same thread
- Linear history, full context visible
Multi-Repo PR Handling:
- Group all PRs for same task in single notification
- Track CI status per PR independently
- Update thread when any PR gets review feedback
GitHub Integration:
- PR review comments → Posted to thread (with repo/PR context)
- CI status per PR → Individual updates
- All PRs merged manually in GitHub web UI
Decision 8: State Commit Frequency
Decision: Gateway-Owned State with Periodic Git Commits
Architecture:
- Single writer: Gateway is sole writer of
.session-state.yaml - Immediate writes: State written to PVC filesystem on every change
- Periodic git commits: Only on significant events:
- Session created
- Question answered
- PR created
- Deployment created
- Session ended
Implementation:
async def update_state(updates):
# Always write to file (PVC)
state = load_yaml(".session-state.yaml")
state.update(updates)
write_yaml(".session-state.yaml", state)
# Commit to git on milestones only
if should_commit(updates):
git.add(".session-state.yaml")
git.commit("chore: Session state checkpoint")Benefits:
- Fast writes (filesystem)
- Audit trail (git commits on milestones)
- Durable (PVC + git backup)
- Portable (can recreate from git)
Decision 9: Namespace Lifecycle Policy
Decision: Namespace Lifecycle Tied to Builder, On-Demand Cleanup
Policy:
- Namespace created when builder session starts
- Name:
{builder-name}-lab - Stays alive as long as builder exists
- No automatic deletion, no retention policies
- User explicitly cleans up:
/opencode cleanup
Cleanup:
async def cleanup_builder(builder_name):
# 1. Delete K8s namespace
await k8s.delete_namespace(f"{builder_name}-lab")
# 2. Remove git worktree
await git.worktree_remove(f".builders/{builder_name}")
# 3. Mark session ended
await session_state.finalize(builder_name)Benefits:
- User has full control
- No surprise deletions
- Namespaces stay alive for testing as long as needed
Decision 10: OpenCode Bridge Implementation
Decision: Separate K8s Service with TypeScript Bridge Plugin + Python Gateway (Slack Socket Mode)
Architecture:
Two K8s Services:
- OpenCode Pod (existing): Runs OpenCode + ttyd
- Loads TypeScript bridge plugin
- Plugin hooks
permission.askevents - Forwards to Gateway via HTTP POST
- Gateway Pod (new): Python/FastAPI service
- Receives events from bridge via internal K8s DNS
- Posts to Slack via Socket Mode (outbound WebSocket)
- Manages state, routing, K8s deployments
Network Flow:
OpenCode Plugin → http://gateway:8000 → Gateway
Gateway → Outbound WebSocket → Slack
No Public Ingress:
- Socket Mode uses outbound connection to Slack
- Internal K8s DNS for OpenCode ↔ Gateway
- Only existing ttyd ingress remains
Bridge Plugin (TypeScript):
const GatewayBridge: Plugin = async (): Promise<Hooks> => {
return {
"permission.ask": async (req, res) => {
const response = await fetch("http://gateway:8000/opencode/permission", {
method: "POST",
body: JSON.stringify(req)
});
const answer = await response.json();
res.status = answer.status;
}
};
};Gateway Service (Python):
# Slack Socket Mode
app = App(token=os.environ["SLACK_BOT_TOKEN"])
handler = SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"])
# Receive from bridge
@app.post("/opencode/permission")
async def handle_permission(request):
# Post to Slack, wait for response
answer = await ask_user_via_slack(request)
return {"status": answer}Shared PVC:
- Both OpenCode and Gateway mount same PVC
- Gateway writes
.session-state.yaml - All see same filesystem
Implementation Strategy
Steel Thread: OpenCode Questions → Slack
Scope:
- Bridge plugin forwards
permission.askto Gateway - Gateway posts question to Slack with buttons
- User clicks button
- Gateway returns answer to Bridge
- OpenCode proceeds
Two Builders:
- 0012-slack-gateway: Python Gateway service (Slack Socket Mode)
- 0012-bridge-plugin: TypeScript OpenCode bridge plugin
Contract: HTTP POST /opencode/permission interface
Timeline: 1-2 days to get questions flowing
Summary
Project: OpenCode Slack Integration (Project #0012) Status: Architecture Complete, Ready for Implementation Methodology: BMAD Builders: 2 parallel workspaces created
Next Steps:
- Implement steel thread (questions → Slack)
- Validate OpenCode SDK integration
- Expand to full workflow (routing, deployment, testing)
Kubernetes Integration Architecture
Date Added: 2026-01-31
Context: Steel thread deployment to k8s lab environment
Component Deployment Model
Three K8s Components:
-
LGTM Stack (
k8s-lab/components/lgtm)- Namespace:
lgtm - Image:
grafana/otel-lgtm:latest(single all-in-one image) - Services: Grafana (3000), Loki API (3100), OTLP gRPC/HTTP (4317/4318)
- Storage: PVC for Grafana/Loki data persistence
- Ingress:
lgtm.lab.ctoaas.co→ Grafana UI - Purpose: Foundational observability for all ai-dev components
- Namespace:
-
OpenCode Slack Gateway (
ai-dev/infrastructure/kustomize/components/opencode-slack-gateway)- Namespace:
ai-dev - Image: Built via Taskfile (uv base image pattern), pushed to ghcr.io
- Service: ClusterIP
gateway.ai-dev.svc.cluster.local:8000 - Secrets: ClusterExternalSecret referencing central secret store
SLACK_BOT_TOKENSLACK_APP_TOKEN
- Storage: Mounts existing
code-server-storagePVC (shared with codev pod) - State Management: Clones
ai-dev-staterepo to PVC at startup - Files:
.session-state.yamlwritten to shared PVC - Resources: 256Mi-512Mi memory, 100m-500m CPU (similar to existing gateway component)
- Namespace:
-
Codev Pod Updates (
k8s-lab/components/codev)- Namespace:
code-server(existing) - Dockerfile Changes:
- Bridge plugin copied into build context during CI
npm install && npm run build && npm linkto install plugin globally
- Env Var Addition:
GATEWAY_URL=http://gateway.ai-dev.svc.cluster.local:8000 - Storage: Existing
code-server-storagePVC (shared with gateway) - Plugin Loading: OpenCode automatically loads linked plugin at startup
- Namespace:
Shared Storage Architecture
PVC: code-server-storage (existing, ReadWriteMany)
- Mounted by: codev pod + gateway pod
- Contents:
- Workspace repositories
ai-dev-stategit repo (cloned by gateway).session-state.yamlfiles (written by gateway)
- Rationale: Single source of truth, both components see identical filesystem
Secrets Management
Pattern: ClusterExternalSecrets (established in .ai/steering/secret-management.md)
- Central secret store syncs to k8s secrets
- Namespace labels trigger sync to
ai-devnamespace - No manual secret creation required
- Gateway deployment references generated secrets
No Gateway↔Plugin Auth: Internal k8s traffic, deferred for MVP
Networking
DNS Resolution: Internal k8s DNS
- Bridge plugin →
http://gateway.ai-dev.svc.cluster.local:8000/api/opencode/permission - No ingress required (gateway uses Slack Socket Mode for outbound WebSocket)
External Access:
- Codev: Existing ttyd ingress (unchanged)
- LGTM: New ingress
lgtm.lab.ctoaas.co - Gateway: No public ingress (Socket Mode only)
Deployment Orchestration
ArgoCD Application: k8s-lab/other-seeds/ai-dev.yaml
- Source:
https://github.com/craigedmunds/ai-dev - Path:
infrastructure/kustomize/components(direct reference, no overlay) - Target:
ai-devnamespace - Sync: Auto-sync enabled (lab environment)
No Overlays: Lab environment uses components directly (simpler for steel thread)
Build Strategy
Gateway Image:
- Taskfile at
ai-dev/services/gateway/Taskfile.yaml - Pattern: Similar to
k8s-lab/components/codev(uv base image) - Registry:
ghcr.io/craigedmunds/opencode-slack-gateway - Versioning:
VERSIONfile +-devsuffix for lab builds
Bridge Plugin Integration:
- Build Process: CI copies
ai-dev/plugins/opencode-bridgetok8s-lab/components/codevbuild context - Installation: Dockerfile runs
npm install && npm build && npm linkduring image build - No NPM Registry: Direct source copy (simpler for monorepo-like setup)
State Persistence
Repository: https://github.com/craigedmunds/ai-dev-state
- Location: Cloned to shared PVC by gateway at startup
- Files:
.session-state.yamlper session - Commits: Periodic commits to git on milestones (per Decision 8)
- Durability: PVC (ephemeral) + git (durable backup)
Observability Integration
LGTM Stack:
- Gateway logs → Loki (via Docker logging driver or direct integration)
- OpenCode plugin logs → Loki
- Grafana dashboards for session tracking, question latency, Slack interactions
- Tempo for distributed tracing (future: trace question flow across components)
WebSocket Integration Architecture
Date Added: 2026-02-01
Context: Post-spike findings - OpenCode SDK capabilities, state persistence, plugin loading
Architectural Refinement: WebSocket Bidirectional Communication
Decision Context:
After spiking OpenCode SDK capabilities and state management, we identified critical architectural improvements:
- OpenCode SDK provides full session management API - No need for CLI shell-out or output parsing
- State persistence required - Pod restarts lose OpenCode auth and session history
- Plugin loading from PVC - Enables zero-rebuild iteration cycles
- HTTP insufficient for real-time streaming - Need bidirectional event flow
Decision 11: Bridge-Gateway Communication Protocol
Decision: WebSocket for Events, OpenCode SDK for Session Management
Refined Architecture:
┌─────────────────────────────────────────────┐
│ GATEWAY (Python) │
│ │
│ • Slack Socket Mode integration │
│ • State: session_id ↔ thread_ts │
│ • Event formatting for Slack │
│ • WebSocket server for Bridge │
└─────────────────────────────────────────────┘
▲
│
WebSocket (bidirectional)
│
▼
┌─────────────────────────────────────────────┐
│ BRIDGE PLUGIN (TypeScript) │
│ │
│ Inbound Commands (Gateway → Plugin): │
│ • session.create │
│ • session.message │
│ • session.list │
│ • session.abort │
│ │
│ Outbound Events (Plugin → Gateway): │
│ • session.created │
│ • message.part.streamed │
│ • tool.executed │
│ • agent.milestone │
│ • session.idle/active │
│ • permission.asked (via HTTP) │
│ │
│ Uses: @opencode-ai/sdk internally │
└─────────────────────────────────────────────┘
▲
│
OpenCode Event System
│
▼
┌─────────────────────────────────────────────┐
│ OPENCODE CORE │
│ • Session management via SDK │
│ • BMAD agent routing │
│ • Message processing │
└─────────────────────────────────────────────┘
Rationale:
- Single integration point: All OpenCode interaction flows through Bridge Plugin
- Real-time streaming: WebSocket enables immediate event propagation
- Bidirectional: Gateway can issue commands, Plugin streams events
- Type safety: TypeScript SDK provides strong contracts
- Plugin owns OpenCode: Abstracts SDK changes from Gateway
Gateway-to-Bridge Protocol:
Commands (Gateway → Bridge):
{
type: 'session.create',
request_id: 'req_123',
workspace: '/workspace/builder-x',
task: 'Build authentication',
agent?: 'builder' | 'architect' | 'pm',
model?: { provider: 'anthropic', model: 'claude-sonnet-4' }
}Events (Bridge → Gateway):
{
type: 'message.part.streamed',
session_id: 'ses_abc123',
message_id: 'msg_456',
content: 'I will implement authentication using...',
role: 'assistant'
}Implementation Notes:
- Plugin establishes WebSocket connection to Gateway on startup
- Request/response pattern using
request_idfor correlation - Event streaming for real-time updates (no polling)
- Reconnection logic with exponential backoff
Decision 12: State Persistence Strategy
Decision: Mount OpenCode State Directory to PVC
Problem:
OpenCode stores critical state in:
~/.local/share/opencode/
├── auth.json # OAuth tokens (Anthropic, etc.)
├── storage/
│ ├── session/ # Session metadata
│ ├── message/ # Message content
│ ├── part/ # File diffs, attachments
│ └── project/ # Project configs
Pod restarts = complete state loss (sessions, auth, history).
Solution:
Mount PVC to OpenCode state directories:
# Codev pod volumeMounts
volumeMounts:
- name: code-server-storage
mountPath: /workspace # Code files (existing)
- name: code-server-storage
mountPath: /home/opencode/.local/share/opencode
subPath: .opencode-data # NEW: State persistenceBenefits:
- ✅ Sessions survive pod restarts
- ✅ Auth tokens persist (no re-login)
- ✅ Full conversation history retained
- ✅ Zero code changes required
- ✅ Works with existing PVC architecture
Directory Structure on PVC:
code-server-storage/
├── repos/ # Workspace code (existing)
├── ai-dev-state/ # Session state repo (existing)
├── .opencode-data/ # NEW: OpenCode state
│ ├── auth.json
│ └── storage/
│ ├── session/
│ ├── message/
│ └── ...
└── .opencode-plugins/ # NEW: Plugin code (see Decision 13)
Decision 13: Plugin Loading from PVC
Decision: Load Bridge Plugin from PVC Workspace
Problem:
Current approach bakes plugin into Docker image:
- Every plugin code change = Docker rebuild
- Slow iteration cycle (build → push → deploy → test)
- No hot-reload capability
Solution:
Install plugin from PVC-mounted workspace:
Dockerfile (one-time setup):
# Install OpenCode plugin SDK globally
RUN npm install -g @opencode-ai/pluginEntrypoint script (dynamic plugin loading):
# On pod startup
cd /workspace/.opencode-plugins/opencode-bridge
npm install
npm link
cd ~/.config/opencode
npm link @opencode-bridge
# Start OpenCode (picks up linked plugin)
opencode webBenefits:
- ✅ Edit plugin code → restart pod → new code loads
- ✅ No Docker rebuild needed
- ✅ Fast iteration (seconds vs minutes)
- ✅ Workspace-specific plugin versions possible
- ✅ Supports plugin development workflow
Plugin Directory on PVC:
/workspace/.opencode-plugins/opencode-bridge/
├── src/
│ ├── plugin.ts
│ ├── v2-client.ts
│ └── handlers/
├── package.json
├── tsconfig.json
└── node_modules/ # Installed at pod startup
Alternative Considered:
Direct symlink without npm link - rejected due to OpenCode’s plugin discovery mechanism expecting npm-style resolution.
Decision 14: OpenCode SDK Session Management
Decision: Use OpenCode V2 SDK for Programmatic Session Control
Discovery:
OpenCode exposes full HTTP API via TypeScript SDK:
import { createOpencodeClient } from '@opencode-ai/sdk/dist/v2/client.js';
const client = createOpencodeClient({
baseURL: 'http://localhost:5400'
});
// Create session
const session = await client.session.create({
directory: '/workspace/builder-project',
title: 'Slack-initiated task',
parentID: 'optional-parent-session'
});
// Send message
await client.session.prompt({
sessionID: session.data.id,
parts: [{ type: 'text', text: 'Build authentication feature' }],
agent: 'builder',
model: {
providerID: 'anthropic',
modelID: 'claude-sonnet-4'
}
});
// List sessions (visible in OpenCode web UI)
const sessions = await client.session.list({
directory: '/workspace'
});
// Get messages
const messages = await client.session.messages({
sessionID: session.data.id
});Architectural Impact:
- No CLI shell-out needed: Direct SDK calls replace
opencode runsubprocess - Sessions integrate with web UI: Programmatically created sessions appear in OpenCode web interface
- Agent routing built-in: SDK supports
agent: 'builder'parameter - Model selection supported: Can specify provider/model per message
- Message retrieval: Can poll or stream responses via SDK
Bridge Plugin Implementation:
export const SlackBridgePlugin: Plugin = async (ctx) => {
const opencode = createOpencodeClient();
ws.on('message', async (data) => {
const msg = JSON.parse(data.toString());
if (msg.type === 'session.create') {
const session = await opencode.session.create({
directory: msg.workspace,
title: msg.task
});
// Start session with initial message
await opencode.session.prompt({
sessionID: session.data.id,
parts: [{ type: 'text', text: msg.task }],
agent: msg.agent || 'builder'
});
ws.send(JSON.stringify({
type: 'session.created',
request_id: msg.request_id,
session_id: session.data.id
}));
}
});
};Replaces: Earlier Decision 10 which assumed CLI-based integration. SDK approach is cleaner and more maintainable.
Updated Component Responsibilities
Bridge Plugin (TypeScript):
- ✅ Single source of truth for OpenCode integration
- ✅ WebSocket client to Gateway
- ✅ Request/response for session CRUD via OpenCode SDK
- ✅ Event streaming for real-time updates
- ✅ Permission handling (HTTP for backwards compatibility)
- ✅ Loaded from PVC workspace (hot-reload capable)
Gateway (Python FastAPI):
- ✅ Slack integration (Socket Mode)
- ✅ State management (session ↔ thread mapping)
- ✅ Event formatting (OpenCode events → Slack UI)
- ✅ WebSocket server for Bridge connection
- ✅ State persistence to PVC-mounted git repo
OpenCode Core:
- ✅ Session state persisted to PVC
- ✅ Auth tokens persisted to PVC
- ✅ Accessible via SDK from Bridge Plugin
- ✅ Sessions visible in web UI
MVP End-to-End Flow (Updated)
1. Slack: /opencode start "Build auth"
↓
2. Gateway: Receives via Socket Mode
↓
3. Gateway → Bridge WS: { type: 'session.create', task: 'Build auth' }
↓
4. Bridge: client.session.create() via SDK
↓
5. Bridge: client.session.prompt() to send initial message
↓
6. Bridge → Gateway WS: { type: 'session.created', session_id: 'ses_123' }
↓
7. Gateway: Maps session_id ↔ slack_thread_ts
↓
8. OpenCode: Processes with BMAD agent
↓
9. OpenCode Events: Streamed to Bridge via event system
↓
10. Bridge → Gateway WS: { type: 'message.part.streamed', content: '...' }
↓
11. Gateway → Slack: Updates thread with agent output
State Persistence Throughout:
- OpenCode state → PVC (.opencode-data)
- Session state → PVC (ai-dev-state repo)
- Plugin code → PVC (.opencode-plugins)
Decision 15: UI-Initiated Session Configuration Collection
Date Added: 2026-02-02
Context: Session creation flow for sessions started in OpenCode UI (not via Slack)
Decision: Lazy, Orthogonal Configuration Collection with Optional Bundling
Problem:
When users create sessions directly in OpenCode UI, we need two distinct types of configuration:
- Builder Configuration (workspace management):
- Project ID (from
.ai/projectlist.md) - Workspace name
- Repositories
- Category
- Project ID (from
- Slack Configuration (notification routing):
- Channel
- Priority
- Notification level
These configs are needed at different times and can be collected in different UIs depending on user presence.
Architecture:
Two Independent, Lazy-Collected Configurations:
# Session starts with minimal state
session = {
"id": "ses_123",
"title": "Add rate limiting to auth API",
"directory": "/workspace/repos/domain-apis",
"type": "exploratory", # Starts as exploratory
"builder_config": None, # Collected when needed
"slack_config": None # Collected when needed
}Collection Timing:
-
Builder Config - Collected when first write operation attempted:
- Triggered by:
mcp_edit,mcp_write, or write-relatedbashcommands - Collection UI: OpenCode modal if user present in web UI, Slack form if user in Slack
- Required fields: Project ID, workspace name, repositories, category
- Action: Initializes builder workspace via
task builder:init BUILDER_NAME={workspace_name} REPOS={repos}
- Triggered by:
-
Slack Config - Collected when attention leaves OpenCode:
- Triggered by: First notification when user not in OpenCode web UI (presence detection)
- Collection UI: Always Slack (DM or existing thread)
- Required fields: Channel, priority
- Action: Creates Slack thread, establishes notification routing
Configuration Collection Implementation:
class SessionConfigManager:
async def ensure_builder_config(self, session_id):
"""Ensure builder config exists, prompt if needed"""
session = await get_session(session_id)
if session.builder_config:
return session.builder_config
# Infer smart defaults from context
defaults = {
"project_id": infer_project_id(session), # Match to projectlist.md
"workspace_name": f"{project_id}-{slugify(session.title)}",
"repositories": infer_repos(session.directory),
"category": infer_category(session)
}
# Collect from appropriate UI
presence = await get_user_presence()
if presence.active_in_opencode_ui:
# User in OpenCode - prompt there
config = await prompt_opencode_modal(
title="Initialize Builder Workspace",
message="Making changes requires a builder workspace:",
fields=defaults,
optional_section={
"enable_slack": False,
"channel": infer_channel(defaults["category"]),
"priority": "medium"
}
)
else:
# User in Slack or unknown - prompt in Slack
config = await prompt_slack_form(
title="🏗️ Initialize Builder Workspace",
message=f"To make changes to '{session.title}', I need to set up a workspace:",
fields=defaults,
optional_section={
"enable_slack": True, # Default YES in Slack
"channel": infer_channel(defaults["category"]),
"priority": "medium"
}
)
# Initialize builder workspace
workspace_path = await init_builder(
builder_name=config.workspace_name,
repos=config.repositories
)
# Update session
await update_session(session_id, {
"builder_config": {**config, "workspace_path": workspace_path},
"type": "work"
})
# If user opted in to Slack config, set that up too
if config.get("enable_slack"):
await setup_slack_config(session_id, config)
return config
async def ensure_slack_config(self, session_id):
"""Ensure Slack config exists, prompt if needed"""
session = await get_session(session_id)
if session.slack_config:
return session.slack_config
# Infer smart defaults
channel = infer_channel_from_category(
session.builder_config?.category if session.builder_config else None
)
# Always collect via Slack (since we're routing there)
config = await prompt_slack_form(
target="DM",
title="📬 Setup Notifications",
message=f"Where should I post updates for '{session.title}'?",
fields={
"channel": channel,
"priority": "medium",
"notification_level": "milestones"
}
)
# Create thread in chosen channel
thread_ts = await create_slack_thread(
channel=config.channel,
session=session
)
# Update session
await update_session(session_id, {
"slack_config": {**config, "thread_ts": thread_ts}
})
return configProject ID Inference:
def infer_project_id(session):
"""Match session to project in .ai/projectlist.md"""
project_list = load_project_list()
# Strategy 1: Session title matches project title
for project in project_list:
if title_similarity(session.title, project.title) > 0.8:
return project.id
# Strategy 2: Detected repos match project repos
detected_repos = detect_git_repos(session.directory)
for project in project_list:
if project_uses_repos(project, detected_repos):
return project.id
# Strategy 3: Category has single active project
category = infer_category(session.directory)
matching = [p for p in project_list if p.category == category and p.status == "implementing"]
if len(matching) == 1:
return matching[0].id
# Must prompt user
return NoneOptional Bundling (Shortcut):
When prompting for one config, offer optional section to collect the other:
- In OpenCode Modal: “Optional: Setup Slack notifications now?” (unchecked by default)
- In Slack Form: “Optional: Setup Slack notifications?” (checked by default since user is already in Slack)
This reduces interruptions - user can provide both configs at once if they want.
State Transitions:
Session States:
1. (exploratory, no_builder, no_slack)
→ Reading code, researching, Q&A
→ All work in OpenCode web UI
2. (work, builder_init, no_slack)
→ Making code changes
→ Work in OpenCode web UI
→ Notifications in OpenCode UI
3. (exploratory, no_builder, slack_configured)
→ Reading code, researching
→ User went offline
→ Notifications in Slack
4. (work, builder_init, slack_configured)
→ Making code changes
→ User went offline OR opted in early
→ Notifications in Slack
All Transition Paths:
Path A: UI → Build (online)
Create session → Attempt write → Prompt builder (OpenCode) → Continue
Path B: UI → Offline → Notify
Create session → User offline → Event needs attention → Prompt Slack → Thread created
Path C: UI → Build → Offline
Create session → Attempt write → Prompt builder (OpenCode) → User offline → Prompt Slack
Path D: UI → Offline → Build in Slack
Create session → User offline → Prompt Slack → Attempt write → Prompt builder (Slack)
Path E: Slack-initiated
/opencode start → Collect both configs in one form → Builder + Slack ready
Benefits:
- ✅ Minimal friction - Only prompt when actually needed
- ✅ Natural timing - Collection happens at logical trigger points
- ✅ Smart inference - Pre-fill forms with detected values
- ✅ Flexibility - Can collect in either UI depending on user presence
- ✅ Optional bundling - User can provide both configs at once to avoid future interruptions
- ✅ Exploratory sessions stay lightweight - No config needed for read-only work
- ✅ Consistent with project numbering - Builder names use project IDs from
.ai/projectlist.md
Workspace Naming Convention:
Builder workspaces use format: .builders/{project-id}-{task-slug}/
Examples:
.builders/0012-add-rate-limiting/- Project 0012 (OpenCode Slack Integration).builders/0013-notification-system/- Project 0013 (Grid Exit Strategy)
This aligns with existing project numbering in .ai/projectlist.md and spec/plan file naming conventions.
Architecture decisions finalized 2026-01-31.
Kubernetes integration architecture added 2026-01-31.
WebSocket integration and state persistence added 2026-02-01.
UI-initiated session configuration added 2026-02-02.