Architecture Decision Document: OpenCode Slack Integration

This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together.

Document Overview

Project: OpenCode Slack Integration
Status: Architecture In Progress
Last Updated: 2026-01-31
Architect: Winston (with Craig)

⚠️ NOTE: For the core session pattern (builder workspaces, lifecycle, message routing) shared across all chat platforms, see:

📄 opencode-session-pattern.md

This document covers Slack-specific architecture, deployment automation, and K8s integration.


Input Documents Loaded

This architecture is informed by:

  • PRD (prd.md) - Product requirements and user workflows
  • Discovery Questions (discovery-questions.md) - Annotated requirements from user
  • Architecture Questions Round 2 (architecture-questions-round2.md) - Technical clarifications
  • API Contract (api-contract.md) - Slack ↔ Backend interface specification
  • Work Package: Backend (work-package-backend.md) - Backend implementation scope
  • Work Package: Slack (work-package-slack.md) - Slack connector implementation scope
  • Workspace Pattern (workspace-pattern.md) - Multi-repo workspace approach (using par)

Project Context Analysis

Requirements Overview

Functional Requirements:

The OpenCode Slack Integration bridges asynchronous Slack-based interaction with OpenCode CLI sessions, supporting AI-assisted development across multiple parallel projects. The system encompasses:

  • Slack Interface Layer (FR-1): Slash command workflow forms for task initiation with rich metadata (category, project, repos, priority)
  • Intelligent Routing (FR-2): Two-dimensional routing decisions combining BMAD agent type (architect/PM/builder/party-mode) with AI model selection (Claude Code API vs Qwen 2.5 Coder 32B local)
  • Workspace Orchestration (FR-3): Integration with par workspace manager to create isolated multi-repo builder environments
  • Session Bridge (FR-4): Bidirectional communication between Slack threads and OpenCode sessions, with dual-UI visibility (Slack + OpenCode web UI)
  • Deployment Automation (FR-5, FR-6): ConfigMap/PVC-based deployment to K8s lab namespaces with automated test execution
  • Async Question Handling (FR-7): Interactive Slack buttons with configurable timeouts and recommended defaults
  • Progress Visibility (FR-8): Milestone-based updates with periodic summaries for long-running work
  • Multi-Project Concurrency (FR-9): Isolated workspaces and sessions enabling parallel work across projects
  • Session Durability (FR-10): Git-backed state persistence supporting work spanning days/weeks

Non-Functional Requirements:

  • Performance (NFR-1): Sub-3-second Slack acknowledgment, <30-second deployment, 1-second question forwarding
  • Reliability (NFR-2): Webhook retry logic, git-backed state (zero loss tolerance), graceful failure recovery
  • Security (NFR-3): Slack signature verification, K8s RBAC least-privilege, namespace isolation, secret protection
  • Scalability (NFR-4): MVP scoped for single user with <10 concurrent sessions

Scale & Complexity:

  • Primary domain: Backend orchestration + Chat platform integration + K8s deployment automation
  • Complexity level: Medium-High
    • Real-time streaming (SSE for deployment progress)
    • Multi-session state management
    • Kubernetes namespace lifecycle orchestration
    • Dual AI model routing and availability management
    • Git-based distributed state synchronization
  • Estimated architectural components: 8-10 major components
    • Slack connector
    • Backend API gateway
    • Session manager
    • OpenCode bridge
    • BMAD router with model selector
    • Workspace manager (integrates with par)
    • Deployment orchestrator
    • Test orchestrator
    • K8s namespace manager
    • Webhook event dispatcher

Technical Constraints & Dependencies

Critical Dependencies:

  • OpenCode CLI: Session management, web UI integration, output parsing capabilities
  • Par (Workspace Manager): Multi-repo workspace creation with worktree isolation
  • Kubernetes Infrastructure: Namespace creation RBAC, ingress controller, cert-manager, PVC provisioning
  • Slack Platform: Custom app installation, webhook endpoints, interactive components, workflow forms
  • Git: State persistence, workspace branching, session resumability

Architectural Constraints:

  • MVP Timeline: 1-2 week implementation window drives technology choices toward proven patterns
  • Single User Scope: Simplified authentication, no multi-tenancy concerns for MVP
  • No Image Builds: ConfigMap/PVC deployment strategy only (fast iteration over production-readiness)
  • Session Persistence Requirement: State must survive gateway restarts, OpenCode crashes, multi-day inactivity
  • Performance Target: <30 second deploy time influences manifest strategy and K8s interaction patterns

Technology Choices Implied by Requirements:

  • Python/FastAPI (based on work packages, aligns with K8s client libraries)
  • Server-Sent Events (SSE) for deployment progress streaming
  • Git-backed YAML for session state
  • Kustomize for K8s manifests (per infrastructure notes)
  • Contract-first development (API contract enables parallel Slack/Backend work)

Cross-Cutting Concerns Identified

1. Session Lifecycle Management

  • Creation: Workspace initialization via task builder:init, OpenCode spawn, state file creation
  • Active state: Message forwarding, output parsing, question detection, progress tracking
  • Persistence: Git commits for state checkpoints, resumability after crashes
  • Cleanup: Workspace removal, namespace deletion (immediate vs deferred), branch cleanup

2. AI Model Selection & Availability

  • Routing Heuristics: Task analysis determines agent type AND model recommendation
    • Architecture/complex reasoning → Claude Code (API)
    • Code generation/implementation → Qwen 2.5 Coder 32B (local)
  • Availability Checking: Local model health checks, API quota validation
  • Fallback Strategy: What happens when recommended model unavailable? (OPEN QUESTION)
  • Cost/Performance Tradeoffs: API costs vs local compute, latency considerations

3. Concurrent Access Control

  • Dual UI Problem: Sessions visible in both Slack and OpenCode web UI
  • Ownership Semantics: Can Slack bot and human both send messages? (OPEN QUESTION)
  • Conflict Resolution: Race conditions when both interfaces active simultaneously
  • Visibility Strategy: Should web UI distinguish Slack-managed sessions? (OPEN QUESTION)

4. Git-Based State Synchronization

  • State File Format: .session-state.yaml schema and versioning
  • Commit Strategy: When to commit state (every milestone? every message? periodic?)
  • Branching Strategy: Builder branches, question timeout branches (OPEN QUESTION)
  • Merge/Cleanup: What happens on session completion? PR? Direct merge? (OPEN QUESTION)

5. Kubernetes Security Boundaries

  • RBAC Design: Gateway service account permissions (namespace creation, manifest apply, ingress creation)
  • Namespace Isolation: Lab namespaces must not access production resources
  • Secret Management: Slack tokens, AI API keys, K8s credentials
  • Network Policies: Ingress-only access to lab deployments

6. Webhook Reliability

  • Retry Logic: Slack webhook failures must not lose events
  • Event Ordering: Progress updates must arrive in sequence
  • Idempotency: Repeated webhook deliveries must not duplicate actions
  • Timeout Handling: Long-running operations must not block webhook responses

Architectural Decisions Required

Based on this analysis, the following architectural decisions must be made:

  1. Component Architecture: Monolith vs microservices split between Slack connector and backend orchestrator
  2. Model Selection Algorithm: Heuristics for agent type + model recommendation with confidence scoring
  3. Model Fallback Strategy: Behavior when Claude API unavailable or Qwen local model down
  4. Session Ownership Model: Exclusive Slack bot control vs shared access with web UI
  5. Question Timeout Git Strategy: Branching approach when work continues before user responds
  6. Deployment Promotion Workflow: PR creation vs direct merge vs GitOps sync
  7. Slack Threading Strategy: When to create new threads vs continue in existing thread
  8. State Commit Frequency: Balance between durability and git noise
  9. Namespace Lifecycle Policy: Immediate cleanup vs time-based retention vs manual
  10. OpenCode Bridge Implementation: PTY wrapping, tmux scripting, or session API integration

Architectural Decisions

Decision 1: Component Architecture

Decision: Modular Monolith with Clean Module Boundaries

Rationale:

  • Single deployment artifact reduces operational complexity for MVP
  • Clean module boundaries enable parallel development
  • API contract provides future optionality to split into microservices
  • MVP timeline (1-2 weeks) favors shipping over premature optimization
  • Single user scope means no scaling pressure

Implementation:

  • Single FastAPI application
  • Modules organized by API contract boundaries:
    • connectors/slack/ - Slack Socket Mode integration
    • core/ - Session management, BMAD routing
    • deployment/ - K8s orchestration
    • integrations/ - OpenCode SDK bridge
  • Contract tests validate internal boundaries
  • Can extract to microservices post-MVP if needed

Decision 2: Model Selection Algorithm

Decision: Keyword-Based Heuristic with Learning Loop

Rationale:

  • Simple keyword matching achieves 70-80% accuracy (acceptable for MVP)
  • User maintains full control with override capability
  • Logging all decisions builds dataset for future ML model
  • Fast to implement, easy to understand and debug

Algorithm:

def route_task(task_title, task_description):
    # Keyword analysis
    architecture_keywords = ["design", "architecture", "security", "scale"]
    business_keywords = ["feature", "user", "workflow", "product"]
    implementation_keywords = ["implement", "fix", "refactor", "optimize"]
    
    # Scoring with complexity boost
    complexity = estimate_complexity(task_description)
    scores = calculate_scores(keywords, complexity)
    
    # Agent selection
    agent_type = max(scores, key=scores.get)
    
    # Model selection based on agent
    if agent_type in ["architect", "pm"]:
        model = "claude-code"  # Complex reasoning
    else:
        model = "qwen-coder"  # Code generation
    
    return RoutingDecision(
        agent=agent_type,
        model=model,
        confidence=calculate_confidence(scores),
        reasoning=generate_reasoning()
    )

Learning Loop:

  • Log every routing decision + user override
  • Build training dataset for future ML-based routing

Decision 3: Model Fallback Strategy

Decision: Fail Fast with Opportunistic Recovery

Strategy:

  1. Health check fails → Post error to Slack with [Retry] [Switch to Alternative] buttons
  2. Background polling checks model health every 30-60 seconds
  3. If model recovers before user responds → Update Slack message, proceed with original
  4. If user responds first → Honor user choice, cancel polling
  5. Polling timeout: 2 minutes maximum

Implementation:

async def handle_model_unavailable(session_id, recommended_model):
    message_ts = await slack.post_message(
        text=f"⚠️ {recommended_model} currently unavailable.",
        buttons=["Retry", f"Switch to {fallback_model}"]
    )
    
    # Race: user response vs model recovery
    result = await race(
        poll_model_availability(recommended_model, max_duration=120),
        wait_for_user_response(message_ts)
    )
    
    return result.choice

Benefits:

  • User never blocked
  • System attempts self-healing
  • Clear communication about fallback options

Decision 4: Session Ownership Model

Decision: Shared Session with Global Presence-Aware Notification Routing

Architecture:

  • OpenCode session is source of truth
  • Both Slack and Web UI can send messages to same session
  • Global notification preference (not per-session):
    1. Explicit preference (highest priority): /focus slack or /focus web
    2. Automatic presence detection: Web activity → Web notifications, Idle → Slack
    3. Default: Slack (mobile-first)

Notification Routing:

class GlobalNotificationRouter:
    def get_target(self) -> str:
        if self.user_preference:  # Explicit
            return self.user_preference
        
        if self.last_web_activity and (now() - self.last_web_activity) < 10min:
            return "web"  # Auto-detected presence
        
        return "slack"  # Default

Escalation Rules:

  • Critical errors → Both channels (override preference)
  • Urgent timeouts → Both channels
  • Major milestones → Respect preference
  • Questions → Respect preference

Decision 5: Question Timeout Git Strategy

Decision: Confidence-Based Speculative Branching on Sub-Branches

Strategy:

  • Never block work
  • Create speculative sub-branch from feature branch
  • Continue work at confidence-appropriate pace
  • Merge or discard based on user response

Branching Pattern:

feature/jwt-auth (base feature branch)
├── tag: question-Q123-asked
└── feature/jwt-auth-Q123-rs256 (speculative)
    └── (work continues here)

Question Classification:

  1. Deferrable (Boolean NFRs): Default “No”, continue without feature
  2. Blocking (Multi-choice, Functional Requirements): Must get answer

Confidence-Based Pace:

  • High (>80%): Full speed on speculative branch
  • Medium (50-80%): Finish current task, pause
  • Low (<50%): Minimal work, escalate quickly

On User Response:

  • Matches recommendation → Squash merge to feature branch
  • Different answer → Reset to tag, create new branch with correct choice

Decision 6: Deployment Promotion Workflow

Decision: PR Creation with Manual Merge in GitHub Web UI

Workflow:

  1. User clicks “Approve & Promote” in Slack
  2. System pushes feature branch(es) to origin (multi-repo)
  3. System creates PR(s) via GitHub API
  4. PR description includes:
    • Test results summary
    • Lab deployment URL(s)
    • Agent decisions and reasoning
  5. User merges PR manually in GitHub web UI
  6. PR CI tests must pass (standard GitHub workflow)
  7. Builder namespace stays alive until manual cleanup

Multi-Repo Support:

📝 **Pull Requests Ready for Review**
 
**PRs Created:**
1️⃣ **domain-apis** - PR #123 (✅ Tests: 45/45 passing)
2️⃣ **ai-dev** - PR #87 (✅ Tests: 12/12 passing)
 
**Test Deployments:**
🌐 api-service: https://api-service.builder.lab.ctoaas.co
🌐 gateway: https://gateway.builder.lab.ctoaas.co
 
💻 Code Server: https://code.example.com/workspace/...

Decision 7: Slack Threading Strategy

Decision: Single Thread Per Session with Rich Multi-Repo PR Notifications

Pattern:

  • One Slack thread per work session/task
  • All progress, questions, and PR updates in same thread
  • Linear history, full context visible

Multi-Repo PR Handling:

  • Group all PRs for same task in single notification
  • Track CI status per PR independently
  • Update thread when any PR gets review feedback

GitHub Integration:

  • PR review comments → Posted to thread (with repo/PR context)
  • CI status per PR → Individual updates
  • All PRs merged manually in GitHub web UI

Decision 8: State Commit Frequency

Decision: Gateway-Owned State with Periodic Git Commits

Architecture:

  • Single writer: Gateway is sole writer of .session-state.yaml
  • Immediate writes: State written to PVC filesystem on every change
  • Periodic git commits: Only on significant events:
    • Session created
    • Question answered
    • PR created
    • Deployment created
    • Session ended

Implementation:

async def update_state(updates):
    # Always write to file (PVC)
    state = load_yaml(".session-state.yaml")
    state.update(updates)
    write_yaml(".session-state.yaml", state)
    
    # Commit to git on milestones only
    if should_commit(updates):
        git.add(".session-state.yaml")
        git.commit("chore: Session state checkpoint")

Benefits:

  • Fast writes (filesystem)
  • Audit trail (git commits on milestones)
  • Durable (PVC + git backup)
  • Portable (can recreate from git)

Decision 9: Namespace Lifecycle Policy

Decision: Namespace Lifecycle Tied to Builder, On-Demand Cleanup

Policy:

  • Namespace created when builder session starts
  • Name: {builder-name}-lab
  • Stays alive as long as builder exists
  • No automatic deletion, no retention policies
  • User explicitly cleans up: /opencode cleanup

Cleanup:

async def cleanup_builder(builder_name):
    # 1. Delete K8s namespace
    await k8s.delete_namespace(f"{builder_name}-lab")
    
    # 2. Remove git worktree
    await git.worktree_remove(f".builders/{builder_name}")
    
    # 3. Mark session ended
    await session_state.finalize(builder_name)

Benefits:

  • User has full control
  • No surprise deletions
  • Namespaces stay alive for testing as long as needed

Decision 10: OpenCode Bridge Implementation

Decision: Separate K8s Service with TypeScript Bridge Plugin + Python Gateway (Slack Socket Mode)

Architecture:

Two K8s Services:

  1. OpenCode Pod (existing): Runs OpenCode + ttyd
    • Loads TypeScript bridge plugin
    • Plugin hooks permission.ask events
    • Forwards to Gateway via HTTP POST
  2. Gateway Pod (new): Python/FastAPI service
    • Receives events from bridge via internal K8s DNS
    • Posts to Slack via Socket Mode (outbound WebSocket)
    • Manages state, routing, K8s deployments

Network Flow:

OpenCode Plugin → http://gateway:8000 → Gateway
Gateway → Outbound WebSocket → Slack

No Public Ingress:

  • Socket Mode uses outbound connection to Slack
  • Internal K8s DNS for OpenCode ↔ Gateway
  • Only existing ttyd ingress remains

Bridge Plugin (TypeScript):

const GatewayBridge: Plugin = async (): Promise<Hooks> => {
  return {
    "permission.ask": async (req, res) => {
      const response = await fetch("http://gateway:8000/opencode/permission", {
        method: "POST",
        body: JSON.stringify(req)
      });
      const answer = await response.json();
      res.status = answer.status;
    }
  };
};

Gateway Service (Python):

# Slack Socket Mode
app = App(token=os.environ["SLACK_BOT_TOKEN"])
handler = SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"])
 
# Receive from bridge
@app.post("/opencode/permission")
async def handle_permission(request):
    # Post to Slack, wait for response
    answer = await ask_user_via_slack(request)
    return {"status": answer}

Shared PVC:

  • Both OpenCode and Gateway mount same PVC
  • Gateway writes .session-state.yaml
  • All see same filesystem

Implementation Strategy

Steel Thread: OpenCode Questions → Slack

Scope:

  • Bridge plugin forwards permission.ask to Gateway
  • Gateway posts question to Slack with buttons
  • User clicks button
  • Gateway returns answer to Bridge
  • OpenCode proceeds

Two Builders:

  1. 0012-slack-gateway: Python Gateway service (Slack Socket Mode)
  2. 0012-bridge-plugin: TypeScript OpenCode bridge plugin

Contract: HTTP POST /opencode/permission interface

Timeline: 1-2 days to get questions flowing


Summary

Project: OpenCode Slack Integration (Project #0012) Status: Architecture Complete, Ready for Implementation Methodology: BMAD Builders: 2 parallel workspaces created

Next Steps:

  1. Implement steel thread (questions → Slack)
  2. Validate OpenCode SDK integration
  3. Expand to full workflow (routing, deployment, testing)

Kubernetes Integration Architecture

Date Added: 2026-01-31
Context: Steel thread deployment to k8s lab environment

Component Deployment Model

Three K8s Components:

  1. LGTM Stack (k8s-lab/components/lgtm)

    • Namespace: lgtm
    • Image: grafana/otel-lgtm:latest (single all-in-one image)
    • Services: Grafana (3000), Loki API (3100), OTLP gRPC/HTTP (4317/4318)
    • Storage: PVC for Grafana/Loki data persistence
    • Ingress: lgtm.lab.ctoaas.co → Grafana UI
    • Purpose: Foundational observability for all ai-dev components
  2. OpenCode Slack Gateway (ai-dev/infrastructure/kustomize/components/opencode-slack-gateway)

    • Namespace: ai-dev
    • Image: Built via Taskfile (uv base image pattern), pushed to ghcr.io
    • Service: ClusterIP gateway.ai-dev.svc.cluster.local:8000
    • Secrets: ClusterExternalSecret referencing central secret store
      • SLACK_BOT_TOKEN
      • SLACK_APP_TOKEN
    • Storage: Mounts existing code-server-storage PVC (shared with codev pod)
    • State Management: Clones ai-dev-state repo to PVC at startup
    • Files: .session-state.yaml written to shared PVC
    • Resources: 256Mi-512Mi memory, 100m-500m CPU (similar to existing gateway component)
  3. Codev Pod Updates (k8s-lab/components/codev)

    • Namespace: code-server (existing)
    • Dockerfile Changes:
      • Bridge plugin copied into build context during CI
      • npm install && npm run build && npm link to install plugin globally
    • Env Var Addition: GATEWAY_URL=http://gateway.ai-dev.svc.cluster.local:8000
    • Storage: Existing code-server-storage PVC (shared with gateway)
    • Plugin Loading: OpenCode automatically loads linked plugin at startup

Shared Storage Architecture

PVC: code-server-storage (existing, ReadWriteMany)

  • Mounted by: codev pod + gateway pod
  • Contents:
    • Workspace repositories
    • ai-dev-state git repo (cloned by gateway)
    • .session-state.yaml files (written by gateway)
  • Rationale: Single source of truth, both components see identical filesystem

Secrets Management

Pattern: ClusterExternalSecrets (established in .ai/steering/secret-management.md)

  • Central secret store syncs to k8s secrets
  • Namespace labels trigger sync to ai-dev namespace
  • No manual secret creation required
  • Gateway deployment references generated secrets

No Gateway↔Plugin Auth: Internal k8s traffic, deferred for MVP

Networking

DNS Resolution: Internal k8s DNS

  • Bridge plugin → http://gateway.ai-dev.svc.cluster.local:8000/api/opencode/permission
  • No ingress required (gateway uses Slack Socket Mode for outbound WebSocket)

External Access:

  • Codev: Existing ttyd ingress (unchanged)
  • LGTM: New ingress lgtm.lab.ctoaas.co
  • Gateway: No public ingress (Socket Mode only)

Deployment Orchestration

ArgoCD Application: k8s-lab/other-seeds/ai-dev.yaml

  • Source: https://github.com/craigedmunds/ai-dev
  • Path: infrastructure/kustomize/components (direct reference, no overlay)
  • Target: ai-dev namespace
  • Sync: Auto-sync enabled (lab environment)

No Overlays: Lab environment uses components directly (simpler for steel thread)

Build Strategy

Gateway Image:

  • Taskfile at ai-dev/services/gateway/Taskfile.yaml
  • Pattern: Similar to k8s-lab/components/codev (uv base image)
  • Registry: ghcr.io/craigedmunds/opencode-slack-gateway
  • Versioning: VERSION file + -dev suffix for lab builds

Bridge Plugin Integration:

  • Build Process: CI copies ai-dev/plugins/opencode-bridge to k8s-lab/components/codev build context
  • Installation: Dockerfile runs npm install && npm build && npm link during image build
  • No NPM Registry: Direct source copy (simpler for monorepo-like setup)

State Persistence

Repository: https://github.com/craigedmunds/ai-dev-state

  • Location: Cloned to shared PVC by gateway at startup
  • Files: .session-state.yaml per session
  • Commits: Periodic commits to git on milestones (per Decision 8)
  • Durability: PVC (ephemeral) + git (durable backup)

Observability Integration

LGTM Stack:

  • Gateway logs → Loki (via Docker logging driver or direct integration)
  • OpenCode plugin logs → Loki
  • Grafana dashboards for session tracking, question latency, Slack interactions
  • Tempo for distributed tracing (future: trace question flow across components)

WebSocket Integration Architecture

Date Added: 2026-02-01
Context: Post-spike findings - OpenCode SDK capabilities, state persistence, plugin loading

Architectural Refinement: WebSocket Bidirectional Communication

Decision Context:

After spiking OpenCode SDK capabilities and state management, we identified critical architectural improvements:

  1. OpenCode SDK provides full session management API - No need for CLI shell-out or output parsing
  2. State persistence required - Pod restarts lose OpenCode auth and session history
  3. Plugin loading from PVC - Enables zero-rebuild iteration cycles
  4. HTTP insufficient for real-time streaming - Need bidirectional event flow

Decision 11: Bridge-Gateway Communication Protocol

Decision: WebSocket for Events, OpenCode SDK for Session Management

Refined Architecture:

┌─────────────────────────────────────────────┐
│              GATEWAY (Python)               │
│                                             │
│  • Slack Socket Mode integration           │
│  • State: session_id ↔ thread_ts           │
│  • Event formatting for Slack              │
│  • WebSocket server for Bridge             │
└─────────────────────────────────────────────┘
                    ▲
                    │
              WebSocket (bidirectional)
                    │
                    ▼
┌─────────────────────────────────────────────┐
│     BRIDGE PLUGIN (TypeScript)              │
│                                             │
│  Inbound Commands (Gateway → Plugin):       │
│    • session.create                         │
│    • session.message                        │
│    • session.list                           │
│    • session.abort                          │
│                                             │
│  Outbound Events (Plugin → Gateway):        │
│    • session.created                        │
│    • message.part.streamed                  │
│    • tool.executed                          │
│    • agent.milestone                        │
│    • session.idle/active                    │
│    • permission.asked (via HTTP)            │
│                                             │
│  Uses: @opencode-ai/sdk internally          │
└─────────────────────────────────────────────┘
                    ▲
                    │
           OpenCode Event System
                    │
                    ▼
┌─────────────────────────────────────────────┐
│          OPENCODE CORE                      │
│  • Session management via SDK               │
│  • BMAD agent routing                       │
│  • Message processing                       │
└─────────────────────────────────────────────┘

Rationale:

  • Single integration point: All OpenCode interaction flows through Bridge Plugin
  • Real-time streaming: WebSocket enables immediate event propagation
  • Bidirectional: Gateway can issue commands, Plugin streams events
  • Type safety: TypeScript SDK provides strong contracts
  • Plugin owns OpenCode: Abstracts SDK changes from Gateway

Gateway-to-Bridge Protocol:

Commands (Gateway → Bridge):

{
  type: 'session.create',
  request_id: 'req_123',
  workspace: '/workspace/builder-x',
  task: 'Build authentication',
  agent?: 'builder' | 'architect' | 'pm',
  model?: { provider: 'anthropic', model: 'claude-sonnet-4' }
}

Events (Bridge → Gateway):

{
  type: 'message.part.streamed',
  session_id: 'ses_abc123',
  message_id: 'msg_456',
  content: 'I will implement authentication using...',
  role: 'assistant'
}

Implementation Notes:

  • Plugin establishes WebSocket connection to Gateway on startup
  • Request/response pattern using request_id for correlation
  • Event streaming for real-time updates (no polling)
  • Reconnection logic with exponential backoff

Decision 12: State Persistence Strategy

Decision: Mount OpenCode State Directory to PVC

Problem:

OpenCode stores critical state in:

~/.local/share/opencode/
├── auth.json              # OAuth tokens (Anthropic, etc.)
├── storage/
│   ├── session/           # Session metadata
│   ├── message/           # Message content
│   ├── part/              # File diffs, attachments
│   └── project/           # Project configs

Pod restarts = complete state loss (sessions, auth, history).

Solution:

Mount PVC to OpenCode state directories:

# Codev pod volumeMounts
volumeMounts:
  - name: code-server-storage
    mountPath: /workspace              # Code files (existing)
  - name: code-server-storage
    mountPath: /home/opencode/.local/share/opencode
    subPath: .opencode-data            # NEW: State persistence

Benefits:

  • ✅ Sessions survive pod restarts
  • ✅ Auth tokens persist (no re-login)
  • ✅ Full conversation history retained
  • ✅ Zero code changes required
  • ✅ Works with existing PVC architecture

Directory Structure on PVC:

code-server-storage/
├── repos/                 # Workspace code (existing)
├── ai-dev-state/          # Session state repo (existing)
├── .opencode-data/        # NEW: OpenCode state
│   ├── auth.json
│   └── storage/
│       ├── session/
│       ├── message/
│       └── ...
└── .opencode-plugins/     # NEW: Plugin code (see Decision 13)

Decision 13: Plugin Loading from PVC

Decision: Load Bridge Plugin from PVC Workspace

Problem:

Current approach bakes plugin into Docker image:

  • Every plugin code change = Docker rebuild
  • Slow iteration cycle (build → push → deploy → test)
  • No hot-reload capability

Solution:

Install plugin from PVC-mounted workspace:

Dockerfile (one-time setup):

# Install OpenCode plugin SDK globally
RUN npm install -g @opencode-ai/plugin

Entrypoint script (dynamic plugin loading):

# On pod startup
cd /workspace/.opencode-plugins/opencode-bridge
npm install
npm link
 
cd ~/.config/opencode
npm link @opencode-bridge
 
# Start OpenCode (picks up linked plugin)
opencode web

Benefits:

  • ✅ Edit plugin code → restart pod → new code loads
  • ✅ No Docker rebuild needed
  • ✅ Fast iteration (seconds vs minutes)
  • ✅ Workspace-specific plugin versions possible
  • ✅ Supports plugin development workflow

Plugin Directory on PVC:

/workspace/.opencode-plugins/opencode-bridge/
├── src/
│   ├── plugin.ts
│   ├── v2-client.ts
│   └── handlers/
├── package.json
├── tsconfig.json
└── node_modules/  # Installed at pod startup

Alternative Considered:

Direct symlink without npm link - rejected due to OpenCode’s plugin discovery mechanism expecting npm-style resolution.


Decision 14: OpenCode SDK Session Management

Decision: Use OpenCode V2 SDK for Programmatic Session Control

Discovery:

OpenCode exposes full HTTP API via TypeScript SDK:

import { createOpencodeClient } from '@opencode-ai/sdk/dist/v2/client.js';
 
const client = createOpencodeClient({
  baseURL: 'http://localhost:5400'
});
 
// Create session
const session = await client.session.create({
  directory: '/workspace/builder-project',
  title: 'Slack-initiated task',
  parentID: 'optional-parent-session'
});
 
// Send message
await client.session.prompt({
  sessionID: session.data.id,
  parts: [{ type: 'text', text: 'Build authentication feature' }],
  agent: 'builder',
  model: {
    providerID: 'anthropic',
    modelID: 'claude-sonnet-4'
  }
});
 
// List sessions (visible in OpenCode web UI)
const sessions = await client.session.list({
  directory: '/workspace'
});
 
// Get messages
const messages = await client.session.messages({
  sessionID: session.data.id
});

Architectural Impact:

  • No CLI shell-out needed: Direct SDK calls replace opencode run subprocess
  • Sessions integrate with web UI: Programmatically created sessions appear in OpenCode web interface
  • Agent routing built-in: SDK supports agent: 'builder' parameter
  • Model selection supported: Can specify provider/model per message
  • Message retrieval: Can poll or stream responses via SDK

Bridge Plugin Implementation:

export const SlackBridgePlugin: Plugin = async (ctx) => {
  const opencode = createOpencodeClient();
  
  ws.on('message', async (data) => {
    const msg = JSON.parse(data.toString());
    
    if (msg.type === 'session.create') {
      const session = await opencode.session.create({
        directory: msg.workspace,
        title: msg.task
      });
      
      // Start session with initial message
      await opencode.session.prompt({
        sessionID: session.data.id,
        parts: [{ type: 'text', text: msg.task }],
        agent: msg.agent || 'builder'
      });
      
      ws.send(JSON.stringify({
        type: 'session.created',
        request_id: msg.request_id,
        session_id: session.data.id
      }));
    }
  });
};

Replaces: Earlier Decision 10 which assumed CLI-based integration. SDK approach is cleaner and more maintainable.


Updated Component Responsibilities

Bridge Plugin (TypeScript):

  • ✅ Single source of truth for OpenCode integration
  • ✅ WebSocket client to Gateway
  • ✅ Request/response for session CRUD via OpenCode SDK
  • ✅ Event streaming for real-time updates
  • ✅ Permission handling (HTTP for backwards compatibility)
  • ✅ Loaded from PVC workspace (hot-reload capable)

Gateway (Python FastAPI):

  • ✅ Slack integration (Socket Mode)
  • ✅ State management (session ↔ thread mapping)
  • ✅ Event formatting (OpenCode events → Slack UI)
  • ✅ WebSocket server for Bridge connection
  • ✅ State persistence to PVC-mounted git repo

OpenCode Core:

  • ✅ Session state persisted to PVC
  • ✅ Auth tokens persisted to PVC
  • ✅ Accessible via SDK from Bridge Plugin
  • ✅ Sessions visible in web UI

MVP End-to-End Flow (Updated)

1. Slack: /opencode start "Build auth"
   ↓
2. Gateway: Receives via Socket Mode
   ↓
3. Gateway → Bridge WS: { type: 'session.create', task: 'Build auth' }
   ↓
4. Bridge: client.session.create() via SDK
   ↓
5. Bridge: client.session.prompt() to send initial message
   ↓
6. Bridge → Gateway WS: { type: 'session.created', session_id: 'ses_123' }
   ↓
7. Gateway: Maps session_id ↔ slack_thread_ts
   ↓
8. OpenCode: Processes with BMAD agent
   ↓
9. OpenCode Events: Streamed to Bridge via event system
   ↓
10. Bridge → Gateway WS: { type: 'message.part.streamed', content: '...' }
    ↓
11. Gateway → Slack: Updates thread with agent output

State Persistence Throughout:

  • OpenCode state → PVC (.opencode-data)
  • Session state → PVC (ai-dev-state repo)
  • Plugin code → PVC (.opencode-plugins)

Decision 15: UI-Initiated Session Configuration Collection

Date Added: 2026-02-02
Context: Session creation flow for sessions started in OpenCode UI (not via Slack)

Decision: Lazy, Orthogonal Configuration Collection with Optional Bundling

Problem:

When users create sessions directly in OpenCode UI, we need two distinct types of configuration:

  1. Builder Configuration (workspace management):
    • Project ID (from .ai/projectlist.md)
    • Workspace name
    • Repositories
    • Category
  2. Slack Configuration (notification routing):
    • Channel
    • Priority
    • Notification level

These configs are needed at different times and can be collected in different UIs depending on user presence.

Architecture:

Two Independent, Lazy-Collected Configurations:

# Session starts with minimal state
session = {
    "id": "ses_123",
    "title": "Add rate limiting to auth API",
    "directory": "/workspace/repos/domain-apis",
    "type": "exploratory",  # Starts as exploratory
    "builder_config": None,  # Collected when needed
    "slack_config": None     # Collected when needed
}

Collection Timing:

  1. Builder Config - Collected when first write operation attempted:

    • Triggered by: mcp_edit, mcp_write, or write-related bash commands
    • Collection UI: OpenCode modal if user present in web UI, Slack form if user in Slack
    • Required fields: Project ID, workspace name, repositories, category
    • Action: Initializes builder workspace via task builder:init BUILDER_NAME={workspace_name} REPOS={repos}
  2. Slack Config - Collected when attention leaves OpenCode:

    • Triggered by: First notification when user not in OpenCode web UI (presence detection)
    • Collection UI: Always Slack (DM or existing thread)
    • Required fields: Channel, priority
    • Action: Creates Slack thread, establishes notification routing

Configuration Collection Implementation:

class SessionConfigManager:
    async def ensure_builder_config(self, session_id):
        """Ensure builder config exists, prompt if needed"""
        session = await get_session(session_id)
        
        if session.builder_config:
            return session.builder_config
        
        # Infer smart defaults from context
        defaults = {
            "project_id": infer_project_id(session),  # Match to projectlist.md
            "workspace_name": f"{project_id}-{slugify(session.title)}",
            "repositories": infer_repos(session.directory),
            "category": infer_category(session)
        }
        
        # Collect from appropriate UI
        presence = await get_user_presence()
        
        if presence.active_in_opencode_ui:
            # User in OpenCode - prompt there
            config = await prompt_opencode_modal(
                title="Initialize Builder Workspace",
                message="Making changes requires a builder workspace:",
                fields=defaults,
                optional_section={
                    "enable_slack": False,
                    "channel": infer_channel(defaults["category"]),
                    "priority": "medium"
                }
            )
        else:
            # User in Slack or unknown - prompt in Slack
            config = await prompt_slack_form(
                title="🏗️ Initialize Builder Workspace",
                message=f"To make changes to '{session.title}', I need to set up a workspace:",
                fields=defaults,
                optional_section={
                    "enable_slack": True,  # Default YES in Slack
                    "channel": infer_channel(defaults["category"]),
                    "priority": "medium"
                }
            )
        
        # Initialize builder workspace
        workspace_path = await init_builder(
            builder_name=config.workspace_name,
            repos=config.repositories
        )
        
        # Update session
        await update_session(session_id, {
            "builder_config": {**config, "workspace_path": workspace_path},
            "type": "work"
        })
        
        # If user opted in to Slack config, set that up too
        if config.get("enable_slack"):
            await setup_slack_config(session_id, config)
        
        return config
    
    async def ensure_slack_config(self, session_id):
        """Ensure Slack config exists, prompt if needed"""
        session = await get_session(session_id)
        
        if session.slack_config:
            return session.slack_config
        
        # Infer smart defaults
        channel = infer_channel_from_category(
            session.builder_config?.category if session.builder_config else None
        )
        
        # Always collect via Slack (since we're routing there)
        config = await prompt_slack_form(
            target="DM",
            title="📬 Setup Notifications",
            message=f"Where should I post updates for '{session.title}'?",
            fields={
                "channel": channel,
                "priority": "medium",
                "notification_level": "milestones"
            }
        )
        
        # Create thread in chosen channel
        thread_ts = await create_slack_thread(
            channel=config.channel,
            session=session
        )
        
        # Update session
        await update_session(session_id, {
            "slack_config": {**config, "thread_ts": thread_ts}
        })
        
        return config

Project ID Inference:

def infer_project_id(session):
    """Match session to project in .ai/projectlist.md"""
    project_list = load_project_list()
    
    # Strategy 1: Session title matches project title
    for project in project_list:
        if title_similarity(session.title, project.title) > 0.8:
            return project.id
    
    # Strategy 2: Detected repos match project repos
    detected_repos = detect_git_repos(session.directory)
    for project in project_list:
        if project_uses_repos(project, detected_repos):
            return project.id
    
    # Strategy 3: Category has single active project
    category = infer_category(session.directory)
    matching = [p for p in project_list if p.category == category and p.status == "implementing"]
    if len(matching) == 1:
        return matching[0].id
    
    # Must prompt user
    return None

Optional Bundling (Shortcut):

When prompting for one config, offer optional section to collect the other:

  • In OpenCode Modal: “Optional: Setup Slack notifications now?” (unchecked by default)
  • In Slack Form: “Optional: Setup Slack notifications?” (checked by default since user is already in Slack)

This reduces interruptions - user can provide both configs at once if they want.

State Transitions:

Session States:
1. (exploratory, no_builder, no_slack)
   → Reading code, researching, Q&A
   → All work in OpenCode web UI

2. (work, builder_init, no_slack)
   → Making code changes
   → Work in OpenCode web UI
   → Notifications in OpenCode UI

3. (exploratory, no_builder, slack_configured)
   → Reading code, researching
   → User went offline
   → Notifications in Slack

4. (work, builder_init, slack_configured)
   → Making code changes
   → User went offline OR opted in early
   → Notifications in Slack

All Transition Paths:

Path A: UI → Build (online)
  Create session → Attempt write → Prompt builder (OpenCode) → Continue

Path B: UI → Offline → Notify
  Create session → User offline → Event needs attention → Prompt Slack → Thread created

Path C: UI → Build → Offline
  Create session → Attempt write → Prompt builder (OpenCode) → User offline → Prompt Slack

Path D: UI → Offline → Build in Slack
  Create session → User offline → Prompt Slack → Attempt write → Prompt builder (Slack)

Path E: Slack-initiated
  /opencode start → Collect both configs in one form → Builder + Slack ready

Benefits:

  1. Minimal friction - Only prompt when actually needed
  2. Natural timing - Collection happens at logical trigger points
  3. Smart inference - Pre-fill forms with detected values
  4. Flexibility - Can collect in either UI depending on user presence
  5. Optional bundling - User can provide both configs at once to avoid future interruptions
  6. Exploratory sessions stay lightweight - No config needed for read-only work
  7. Consistent with project numbering - Builder names use project IDs from .ai/projectlist.md

Workspace Naming Convention:

Builder workspaces use format: .builders/{project-id}-{task-slug}/

Examples:

  • .builders/0012-add-rate-limiting/ - Project 0012 (OpenCode Slack Integration)
  • .builders/0013-notification-system/ - Project 0013 (Grid Exit Strategy)

This aligns with existing project numbering in .ai/projectlist.md and spec/plan file naming conventions.


Architecture decisions finalized 2026-01-31.
Kubernetes integration architecture added 2026-01-31.
WebSocket integration and state persistence added 2026-02-01.
UI-initiated session configuration added 2026-02-02.