Architecture Decision Document: OpenCode Slack Integration

This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together.

Document Overview

Project: OpenCode Slack Integration
Status: Architecture In Progress
Last Updated: 2026-01-31
Architect: Winston (with Craig)

⚠️ NOTE: For the core session pattern (builder workspaces, lifecycle, message routing) shared across all chat platforms, see:

📄 opencode-session-pattern.md

This document covers Slack-specific architecture, deployment automation, and K8s integration.

Input Documents Loaded

This architecture is informed by:

PRD (prd.md) - Product requirements and user workflows
Discovery Questions (discovery-questions.md) - Annotated requirements from user
Architecture Questions Round 2 (architecture-questions-round2.md) - Technical clarifications
API Contract (api-contract.md) - Slack ↔ Backend interface specification
Work Package: Backend (work-package-backend.md) - Backend implementation scope
Work Package: Slack (work-package-slack.md) - Slack connector implementation scope
Workspace Pattern (workspace-pattern.md) - Multi-repo workspace approach (using par)

Project Context Analysis

Requirements Overview

Functional Requirements:

The OpenCode Slack Integration bridges asynchronous Slack-based interaction with OpenCode CLI sessions, supporting AI-assisted development across multiple parallel projects. The system encompasses:

Slack Interface Layer (FR-1): Slash command workflow forms for task initiation with rich metadata (category, project, repos, priority)
Intelligent Routing (FR-2): Two-dimensional routing decisions combining BMAD agent type (architect/PM/builder/party-mode) with AI model selection (Claude Code API vs Qwen 2.5 Coder 32B local)
Workspace Orchestration (FR-3): Integration with par workspace manager to create isolated multi-repo builder environments
Session Bridge (FR-4): Bidirectional communication between Slack threads and OpenCode sessions, with dual-UI visibility (Slack + OpenCode web UI)
Deployment Automation (FR-5, FR-6): ConfigMap/PVC-based deployment to K8s lab namespaces with automated test execution
Async Question Handling (FR-7): Interactive Slack buttons with configurable timeouts and recommended defaults
Progress Visibility (FR-8): Milestone-based updates with periodic summaries for long-running work
Multi-Project Concurrency (FR-9): Isolated workspaces and sessions enabling parallel work across projects
Session Durability (FR-10): Git-backed state persistence supporting work spanning days/weeks

Non-Functional Requirements:

Performance (NFR-1): Sub-3-second Slack acknowledgment, <30-second deployment, 1-second question forwarding
Reliability (NFR-2): Webhook retry logic, git-backed state (zero loss tolerance), graceful failure recovery
Security (NFR-3): Slack signature verification, K8s RBAC least-privilege, namespace isolation, secret protection
Scalability (NFR-4): MVP scoped for single user with <10 concurrent sessions

Scale & Complexity:

Primary domain: Backend orchestration + Chat platform integration + K8s deployment automation
Complexity level: Medium-High
- Real-time streaming (SSE for deployment progress)
- Multi-session state management
- Kubernetes namespace lifecycle orchestration
- Dual AI model routing and availability management
- Git-based distributed state synchronization
Estimated architectural components: 8-10 major components
- Slack connector
- Backend API gateway
- Session manager
- OpenCode bridge
- BMAD router with model selector
- Workspace manager (integrates with par)
- Deployment orchestrator
- Test orchestrator
- K8s namespace manager
- Webhook event dispatcher

Technical Constraints & Dependencies

Critical Dependencies:

OpenCode CLI: Session management, web UI integration, output parsing capabilities
Par (Workspace Manager): Multi-repo workspace creation with worktree isolation
Kubernetes Infrastructure: Namespace creation RBAC, ingress controller, cert-manager, PVC provisioning
Slack Platform: Custom app installation, webhook endpoints, interactive components, workflow forms
Git: State persistence, workspace branching, session resumability

Architectural Constraints:

MVP Timeline: 1-2 week implementation window drives technology choices toward proven patterns
Single User Scope: Simplified authentication, no multi-tenancy concerns for MVP

No Image Builds: ConfigMap/PVC deployment strategy only (fast iteration over production-readiness)
Session Persistence Requirement: State must survive gateway restarts, OpenCode crashes, multi-day inactivity
Performance Target: <30 second deploy time influences manifest strategy and K8s interaction patterns

Technology Choices Implied by Requirements:

Python/FastAPI (based on work packages, aligns with K8s client libraries)
Server-Sent Events (SSE) for deployment progress streaming
Git-backed YAML for session state
Kustomize for K8s manifests (per infrastructure notes)
Contract-first development (API contract enables parallel Slack/Backend work)

Cross-Cutting Concerns Identified

1. Session Lifecycle Management

Creation: Workspace initialization via task builder:init, OpenCode spawn, state file creation
Active state: Message forwarding, output parsing, question detection, progress tracking
Persistence: Git commits for state checkpoints, resumability after crashes
Cleanup: Workspace removal, namespace deletion (immediate vs deferred), branch cleanup

2. AI Model Selection & Availability

Routing Heuristics: Task analysis determines agent type AND model recommendation
- Architecture/complex reasoning → Claude Code (API)
- Code generation/implementation → Qwen 2.5 Coder 32B (local)
Availability Checking: Local model health checks, API quota validation
Fallback Strategy: What happens when recommended model unavailable? (OPEN QUESTION)
Cost/Performance Tradeoffs: API costs vs local compute, latency considerations

3. Concurrent Access Control

Dual UI Problem: Sessions visible in both Slack and OpenCode web UI
Ownership Semantics: Can Slack bot and human both send messages? (OPEN QUESTION)
Conflict Resolution: Race conditions when both interfaces active simultaneously
Visibility Strategy: Should web UI distinguish Slack-managed sessions? (OPEN QUESTION)

4. Git-Based State Synchronization

State File Format: .session-state.yaml schema and versioning
Commit Strategy: When to commit state (every milestone? every message? periodic?)
Branching Strategy: Builder branches, question timeout branches (OPEN QUESTION)
Merge/Cleanup: What happens on session completion? PR? Direct merge? (OPEN QUESTION)

5. Kubernetes Security Boundaries

RBAC Design: Gateway service account permissions (namespace creation, manifest apply, ingress creation)
Namespace Isolation: Lab namespaces must not access production resources
Secret Management: Slack tokens, AI API keys, K8s credentials
Network Policies: Ingress-only access to lab deployments

6. Webhook Reliability

Retry Logic: Slack webhook failures must not lose events
Event Ordering: Progress updates must arrive in sequence
Idempotency: Repeated webhook deliveries must not duplicate actions
Timeout Handling: Long-running operations must not block webhook responses

Architectural Decisions Required

Based on this analysis, the following architectural decisions must be made:

Component Architecture: Monolith vs microservices split between Slack connector and backend orchestrator
Model Selection Algorithm: Heuristics for agent type + model recommendation with confidence scoring
Model Fallback Strategy: Behavior when Claude API unavailable or Qwen local model down
Session Ownership Model: Exclusive Slack bot control vs shared access with web UI
Question Timeout Git Strategy: Branching approach when work continues before user responds
Deployment Promotion Workflow: PR creation vs direct merge vs GitOps sync
Slack Threading Strategy: When to create new threads vs continue in existing thread
State Commit Frequency: Balance between durability and git noise
Namespace Lifecycle Policy: Immediate cleanup vs time-based retention vs manual
OpenCode Bridge Implementation: PTY wrapping, tmux scripting, or session API integration

Architectural Decisions

Decision 1: Component Architecture

Decision: Modular Monolith with Clean Module Boundaries

Rationale:

Single deployment artifact reduces operational complexity for MVP
Clean module boundaries enable parallel development
API contract provides future optionality to split into microservices
MVP timeline (1-2 weeks) favors shipping over premature optimization
Single user scope means no scaling pressure

Implementation:

Single FastAPI application
Modules organized by API contract boundaries:
- connectors/slack/ - Slack Socket Mode integration
- core/ - Session management, BMAD routing
- deployment/ - K8s orchestration
- integrations/ - OpenCode SDK bridge
Contract tests validate internal boundaries
Can extract to microservices post-MVP if needed

Decision 2: Model Selection Algorithm

Decision: Keyword-Based Heuristic with Learning Loop

Rationale:

Simple keyword matching achieves 70-80% accuracy (acceptable for MVP)
User maintains full control with override capability
Logging all decisions builds dataset for future ML model
Fast to implement, easy to understand and debug

Algorithm:

def route_task(task_title, task_description):
    # Keyword analysis
    architecture_keywords = ["design", "architecture", "security", "scale"]
    business_keywords = ["feature", "user", "workflow", "product"]
    implementation_keywords = ["implement", "fix", "refactor", "optimize"]
    
    # Scoring with complexity boost
    complexity = estimate_complexity(task_description)
    scores = calculate_scores(keywords, complexity)
    
    # Agent selection
    agent_type = max(scores, key=scores.get)
    
    # Model selection based on agent
    if agent_type in ["architect", "pm"]:
        model = "claude-code"  # Complex reasoning
    else:
        model = "qwen-coder"  # Code generation
    
    return RoutingDecision(
        agent=agent_type,
        model=model,
        confidence=calculate_confidence(scores),
        reasoning=generate_reasoning()
    )

Learning Loop:

Log every routing decision + user override
Build training dataset for future ML-based routing

Decision 3: Model Fallback Strategy

Decision: Fail Fast with Opportunistic Recovery

Strategy:

Health check fails → Post error to Slack with [Retry] [Switch to Alternative] buttons
Background polling checks model health every 30-60 seconds
If model recovers before user responds → Update Slack message, proceed with original
If user responds first → Honor user choice, cancel polling
Polling timeout: 2 minutes maximum

Implementation:

async def handle_model_unavailable(session_id, recommended_model):
    message_ts = await slack.post_message(
        text=f"⚠️ {recommended_model} currently unavailable.",
        buttons=["Retry", f"Switch to {fallback_model}"]
    )
    
    # Race: user response vs model recovery
    result = await race(
        poll_model_availability(recommended_model, max_duration=120),
        wait_for_user_response(message_ts)
    )
    
    return result.choice

Benefits:

User never blocked
System attempts self-healing
Clear communication about fallback options

Decision 4: Session Ownership Model

Decision: Shared Session with Global Presence-Aware Notification Routing

Architecture:

OpenCode session is source of truth
Both Slack and Web UI can send messages to same session
Global notification preference (not per-session):
1. Explicit preference (highest priority): /focus slack or /focus web
2. Automatic presence detection: Web activity → Web notifications, Idle → Slack
3. Default: Slack (mobile-first)

Notification Routing:

class GlobalNotificationRouter:
    def get_target(self) -> str:
        if self.user_preference:  # Explicit
            return self.user_preference
        
        if self.last_web_activity and (now() - self.last_web_activity) < 10min:
            return "web"  # Auto-detected presence
        
        return "slack"  # Default

Escalation Rules:

Critical errors → Both channels (override preference)
Urgent timeouts → Both channels
Major milestones → Respect preference
Questions → Respect preference

Decision 5: Question Timeout Git Strategy

Decision: Confidence-Based Speculative Branching on Sub-Branches

Strategy:

Never block work
Create speculative sub-branch from feature branch
Continue work at confidence-appropriate pace
Merge or discard based on user response

Branching Pattern:

feature/jwt-auth (base feature branch)
├── tag: question-Q123-asked
└── feature/jwt-auth-Q123-rs256 (speculative)
    └── (work continues here)

Question Classification:

Deferrable (Boolean NFRs): Default “No”, continue without feature
Blocking (Multi-choice, Functional Requirements): Must get answer

Confidence-Based Pace:

High (>80%): Full speed on speculative branch
Medium (50-80%): Finish current task, pause
Low (<50%): Minimal work, escalate quickly

On User Response:

Matches recommendation → Squash merge to feature branch
Different answer → Reset to tag, create new branch with correct choice

Decision 6: Deployment Promotion Workflow

Decision: PR Creation with Manual Merge in GitHub Web UI

Workflow:

User clicks “Approve & Promote” in Slack
System pushes feature branch(es) to origin (multi-repo)
System creates PR(s) via GitHub API
PR description includes:
- Test results summary
- Lab deployment URL(s)
- Agent decisions and reasoning
User merges PR manually in GitHub web UI
PR CI tests must pass (standard GitHub workflow)
Builder namespace stays alive until manual cleanup

Multi-Repo Support:

📝 **Pull Requests Ready for Review**
 
**PRs Created:**
1️⃣ **domain-apis** - PR #123 (✅ Tests: 45/45 passing)
2️⃣ **ai-dev** - PR #87 (✅ Tests: 12/12 passing)
 
**Test Deployments:**
🌐 api-service: https://api-service.builder.lab.ctoaas.co
🌐 gateway: https://gateway.builder.lab.ctoaas.co
 
💻 Code Server: https://code.example.com/workspace/...

Decision 7: Slack Threading Strategy

Decision: Single Thread Per Session with Rich Multi-Repo PR Notifications

Pattern:

One Slack thread per work session/task
All progress, questions, and PR updates in same thread
Linear history, full context visible

Multi-Repo PR Handling:

Group all PRs for same task in single notification
Track CI status per PR independently
Update thread when any PR gets review feedback

GitHub Integration:

PR review comments → Posted to thread (with repo/PR context)
CI status per PR → Individual updates
All PRs merged manually in GitHub web UI

Decision 8: State Commit Frequency

Decision: Gateway-Owned State with Periodic Git Commits

Architecture:

Single writer: Gateway is sole writer of .session-state.yaml
Immediate writes: State written to PVC filesystem on every change
Periodic git commits: Only on significant events:
- Session created
- Question answered
- PR created
- Deployment created
- Session ended

Implementation:

async def update_state(updates):
    # Always write to file (PVC)
    state = load_yaml(".session-state.yaml")
    state.update(updates)
    write_yaml(".session-state.yaml", state)
    
    # Commit to git on milestones only
    if should_commit(updates):
        git.add(".session-state.yaml")
        git.commit("chore: Session state checkpoint")

Benefits:

Fast writes (filesystem)
Audit trail (git commits on milestones)
Durable (PVC + git backup)
Portable (can recreate from git)

Decision 9: Namespace Lifecycle Policy

Decision: Namespace Lifecycle Tied to Builder, On-Demand Cleanup

Policy:

Namespace created when builder session starts
Name: {builder-name}-lab
Stays alive as long as builder exists
No automatic deletion, no retention policies
User explicitly cleans up: /opencode cleanup

Cleanup:

async def cleanup_builder(builder_name):
    # 1. Delete K8s namespace
    await k8s.delete_namespace(f"{builder_name}-lab")
    
    # 2. Remove git worktree
    await git.worktree_remove(f".builders/{builder_name}")
    
    # 3. Mark session ended
    await session_state.finalize(builder_name)

Benefits:

User has full control
No surprise deletions
Namespaces stay alive for testing as long as needed

Decision 10: OpenCode Bridge Implementation

Decision: Separate K8s Service with TypeScript Bridge Plugin + Python Gateway (Slack Socket Mode)

Architecture:

Two K8s Services:

OpenCode Pod (existing): Runs OpenCode + ttyd
- Loads TypeScript bridge plugin
- Plugin hooks permission.ask events
- Forwards to Gateway via HTTP POST
Gateway Pod (new): Python/FastAPI service
- Receives events from bridge via internal K8s DNS
- Posts to Slack via Socket Mode (outbound WebSocket)
- Manages state, routing, K8s deployments

Network Flow:

OpenCode Plugin → http://gateway:8000 → Gateway
Gateway → Outbound WebSocket → Slack

No Public Ingress:

Socket Mode uses outbound connection to Slack
Internal K8s DNS for OpenCode ↔ Gateway
Only existing ttyd ingress remains

Bridge Plugin (TypeScript):

const GatewayBridge: Plugin = async (): Promise<Hooks> => {
  return {
    "permission.ask": async (req, res) => {
      const response = await fetch("http://gateway:8000/opencode/permission", {
        method: "POST",
        body: JSON.stringify(req)
      });
      const answer = await response.json();
      res.status = answer.status;
    }
  };
};

Gateway Service (Python):

# Slack Socket Mode
app = App(token=os.environ["SLACK_BOT_TOKEN"])
handler = SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"])
 
# Receive from bridge
@app.post("/opencode/permission")
async def handle_permission(request):
    # Post to Slack, wait for response
    answer = await ask_user_via_slack(request)
    return {"status": answer}

Shared PVC:

Both OpenCode and Gateway mount same PVC
Gateway writes .session-state.yaml
All see same filesystem

Implementation Strategy

Steel Thread: OpenCode Questions → Slack

Scope:

Bridge plugin forwards permission.ask to Gateway
Gateway posts question to Slack with buttons
User clicks button
Gateway returns answer to Bridge
OpenCode proceeds

Two Builders:

0012-slack-gateway: Python Gateway service (Slack Socket Mode)
0012-bridge-plugin: TypeScript OpenCode bridge plugin

Contract: HTTP POST /opencode/permission interface

Timeline: 1-2 days to get questions flowing

Summary

Project: OpenCode Slack Integration (Project #0012) Status: Architecture Complete, Ready for Implementation Methodology: BMAD Builders: 2 parallel workspaces created

Next Steps:

Implement steel thread (questions → Slack)
Validate OpenCode SDK integration
Expand to full workflow (routing, deployment, testing)

Kubernetes Integration Architecture

Date Added: 2026-01-31
Context: Steel thread deployment to k8s lab environment

Component Deployment Model

Three K8s Components:

LGTM Stack (k8s-lab/components/lgtm)
- Namespace: lgtm
- Image: grafana/otel-lgtm:latest (single all-in-one image)
- Services: Grafana (3000), Loki API (3100), OTLP gRPC/HTTP (4317/4318)
- Storage: PVC for Grafana/Loki data persistence
- Ingress: lgtm.lab.ctoaas.co → Grafana UI
- Purpose: Foundational observability for all ai-dev components
OpenCode Slack Gateway (ai-dev/infrastructure/kustomize/components/opencode-slack-gateway)
- Namespace: ai-dev
- Image: Built via Taskfile (uv base image pattern), pushed to ghcr.io
- Service: ClusterIP gateway.ai-dev.svc.cluster.local:8000
- Secrets: ClusterExternalSecret referencing central secret store
  - SLACK_BOT_TOKEN
  - SLACK_APP_TOKEN
- Storage: Mounts existing code-server-storage PVC (shared with codev pod)
- State Management: Clones ai-dev-state repo to PVC at startup
- Files: .session-state.yaml written to shared PVC
- Resources: 256Mi-512Mi memory, 100m-500m CPU (similar to existing gateway component)
Codev Pod Updates (k8s-lab/components/codev)
- Namespace: code-server (existing)
- Dockerfile Changes:
  - Bridge plugin copied into build context during CI
  - npm install && npm run build && npm link to install plugin globally
- Env Var Addition: GATEWAY_URL=http://gateway.ai-dev.svc.cluster.local:8000
- Storage: Existing code-server-storage PVC (shared with gateway)
- Plugin Loading: OpenCode automatically loads linked plugin at startup

Shared Storage Architecture

PVC: code-server-storage (existing, ReadWriteMany)

Mounted by: codev pod + gateway pod
Contents:
- Workspace repositories
- ai-dev-state git repo (cloned by gateway)
- .session-state.yaml files (written by gateway)
Rationale: Single source of truth, both components see identical filesystem

Secrets Management

Pattern: ClusterExternalSecrets (established in .ai/steering/secret-management.md)

Central secret store syncs to k8s secrets
Namespace labels trigger sync to ai-dev namespace
No manual secret creation required
Gateway deployment references generated secrets

No Gateway↔Plugin Auth: Internal k8s traffic, deferred for MVP

Networking

DNS Resolution: Internal k8s DNS

Bridge plugin → http://gateway.ai-dev.svc.cluster.local:8000/api/opencode/permission
No ingress required (gateway uses Slack Socket Mode for outbound WebSocket)

External Access:

Codev: Existing ttyd ingress (unchanged)
LGTM: New ingress lgtm.lab.ctoaas.co
Gateway: No public ingress (Socket Mode only)

Deployment Orchestration

ArgoCD Application: k8s-lab/other-seeds/ai-dev.yaml

Source: https://github.com/craigedmunds/ai-dev
Path: infrastructure/kustomize/components (direct reference, no overlay)
Target: ai-dev namespace
Sync: Auto-sync enabled (lab environment)

No Overlays: Lab environment uses components directly (simpler for steel thread)

Build Strategy

Gateway Image:

Taskfile at ai-dev/services/gateway/Taskfile.yaml
Pattern: Similar to k8s-lab/components/codev (uv base image)
Registry: ghcr.io/craigedmunds/opencode-slack-gateway
Versioning: VERSION file + -dev suffix for lab builds

Bridge Plugin Integration:

Build Process: CI copies ai-dev/plugins/opencode-bridge to k8s-lab/components/codev build context
Installation: Dockerfile runs npm install && npm build && npm link during image build
No NPM Registry: Direct source copy (simpler for monorepo-like setup)

State Persistence

Repository: https://github.com/craigedmunds/ai-dev-state

Location: Cloned to shared PVC by gateway at startup
Files: .session-state.yaml per session
Commits: Periodic commits to git on milestones (per Decision 8)
Durability: PVC (ephemeral) + git (durable backup)

Observability Integration

LGTM Stack:

Gateway logs → Loki (via Docker logging driver or direct integration)
OpenCode plugin logs → Loki
Grafana dashboards for session tracking, question latency, Slack interactions
Tempo for distributed tracing (future: trace question flow across components)

WebSocket Integration Architecture

Date Added: 2026-02-01
Context: Post-spike findings - OpenCode SDK capabilities, state persistence, plugin loading

Decision Context:

After spiking OpenCode SDK capabilities and state management, we identified critical architectural improvements:

OpenCode SDK provides full session management API - No need for CLI shell-out or output parsing
State persistence required - Pod restarts lose OpenCode auth and session history
Plugin loading from PVC - Enables zero-rebuild iteration cycles
HTTP insufficient for real-time streaming - Need bidirectional event flow

Decision 11: Bridge-Gateway Communication Protocol

Decision: WebSocket for Events, OpenCode SDK for Session Management

Refined Architecture:

┌─────────────────────────────────────────────┐
│              GATEWAY (Python)               │
│                                             │
│  • Slack Socket Mode integration           │
│  • State: session_id ↔ thread_ts           │
│  • Event formatting for Slack              │
│  • WebSocket server for Bridge             │
└─────────────────────────────────────────────┘
                    ▲
                    │
              WebSocket (bidirectional)
                    │
                    ▼
┌─────────────────────────────────────────────┐
│     BRIDGE PLUGIN (TypeScript)              │
│                                             │
│  Inbound Commands (Gateway → Plugin):       │
│    • session.create                         │
│    • session.message                        │
│    • session.list                           │
│    • session.abort                          │
│                                             │
│  Outbound Events (Plugin → Gateway):        │
│    • session.created                        │
│    • message.part.streamed                  │
│    • tool.executed                          │
│    • agent.milestone                        │
│    • session.idle/active                    │
│    • permission.asked (via HTTP)            │
│                                             │
│  Uses: @opencode-ai/sdk internally          │
└─────────────────────────────────────────────┘
                    ▲
                    │
           OpenCode Event System
                    │
                    ▼
┌─────────────────────────────────────────────┐
│          OPENCODE CORE                      │
│  • Session management via SDK               │
│  • BMAD agent routing                       │
│  • Message processing                       │
└─────────────────────────────────────────────┘

Rationale:

Single integration point: All OpenCode interaction flows through Bridge Plugin
Real-time streaming: WebSocket enables immediate event propagation
Bidirectional: Gateway can issue commands, Plugin streams events
Type safety: TypeScript SDK provides strong contracts
Plugin owns OpenCode: Abstracts SDK changes from Gateway

Gateway-to-Bridge Protocol:

Commands (Gateway → Bridge):

{
  type: 'session.create',
  request_id: 'req_123',
  workspace: '/workspace/builder-x',
  task: 'Build authentication',
  agent?: 'builder' | 'architect' | 'pm',
  model?: { provider: 'anthropic', model: 'claude-sonnet-4' }
}

Events (Bridge → Gateway):

{
  type: 'message.part.streamed',
  session_id: 'ses_abc123',
  message_id: 'msg_456',
  content: 'I will implement authentication using...',
  role: 'assistant'
}

Implementation Notes:

Plugin establishes WebSocket connection to Gateway on startup
Request/response pattern using request_id for correlation
Event streaming for real-time updates (no polling)
Reconnection logic with exponential backoff

Decision 12: State Persistence Strategy

Decision: Mount OpenCode State Directory to PVC

Problem:

OpenCode stores critical state in:

~/.local/share/opencode/
├── auth.json              # OAuth tokens (Anthropic, etc.)
├── storage/
│   ├── session/           # Session metadata
│   ├── message/           # Message content
│   ├── part/              # File diffs, attachments
│   └── project/           # Project configs

Pod restarts = complete state loss (sessions, auth, history).

Solution:

Mount PVC to OpenCode state directories:

# Codev pod volumeMounts
volumeMounts:
  - name: code-server-storage
    mountPath: /workspace              # Code files (existing)
  - name: code-server-storage
    mountPath: /home/opencode/.local/share/opencode
    subPath: .opencode-data            # NEW: State persistence

Benefits:

✅ Sessions survive pod restarts
✅ Auth tokens persist (no re-login)
✅ Full conversation history retained
✅ Zero code changes required
✅ Works with existing PVC architecture

Directory Structure on PVC:

code-server-storage/
├── repos/                 # Workspace code (existing)
├── ai-dev-state/          # Session state repo (existing)
├── .opencode-data/        # NEW: OpenCode state
│   ├── auth.json
│   └── storage/
│       ├── session/
│       ├── message/
│       └── ...
└── .opencode-plugins/     # NEW: Plugin code (see Decision 13)

Decision 13: Plugin Loading from PVC

Decision: Load Bridge Plugin from PVC Workspace

Problem:

Current approach bakes plugin into Docker image:

Every plugin code change = Docker rebuild
Slow iteration cycle (build → push → deploy → test)
No hot-reload capability

Solution:

Install plugin from PVC-mounted workspace:

Dockerfile (one-time setup):

# Install OpenCode plugin SDK globally
RUN npm install -g @opencode-ai/plugin

Entrypoint script (dynamic plugin loading):

# On pod startup
cd /workspace/.opencode-plugins/opencode-bridge
npm install
npm link
 
cd ~/.config/opencode
npm link @opencode-bridge
 
# Start OpenCode (picks up linked plugin)
opencode web

Benefits:

✅ Edit plugin code → restart pod → new code loads
✅ No Docker rebuild needed
✅ Fast iteration (seconds vs minutes)
✅ Workspace-specific plugin versions possible
✅ Supports plugin development workflow

Plugin Directory on PVC:

/workspace/.opencode-plugins/opencode-bridge/
├── src/
│   ├── plugin.ts
│   ├── v2-client.ts
│   └── handlers/
├── package.json
├── tsconfig.json
└── node_modules/  # Installed at pod startup

Alternative Considered:

Direct symlink without npm link - rejected due to OpenCode’s plugin discovery mechanism expecting npm-style resolution.

Decision 14: OpenCode SDK Session Management

Decision: Use OpenCode V2 SDK for Programmatic Session Control

Discovery:

OpenCode exposes full HTTP API via TypeScript SDK:

import { createOpencodeClient } from '@opencode-ai/sdk/dist/v2/client.js';
 
const client = createOpencodeClient({
  baseURL: 'http://localhost:5400'
});
 
// Create session
const session = await client.session.create({
  directory: '/workspace/builder-project',
  title: 'Slack-initiated task',
  parentID: 'optional-parent-session'
});
 
// Send message
await client.session.prompt({
  sessionID: session.data.id,
  parts: [{ type: 'text', text: 'Build authentication feature' }],
  agent: 'builder',
  model: {
    providerID: 'anthropic',
    modelID: 'claude-sonnet-4'
  }
});
 
// List sessions (visible in OpenCode web UI)
const sessions = await client.session.list({
  directory: '/workspace'
});
 
// Get messages
const messages = await client.session.messages({
  sessionID: session.data.id
});

Architectural Impact:

No CLI shell-out needed: Direct SDK calls replace opencode run subprocess
Sessions integrate with web UI: Programmatically created sessions appear in OpenCode web interface
Agent routing built-in: SDK supports agent: 'builder' parameter
Model selection supported: Can specify provider/model per message
Message retrieval: Can poll or stream responses via SDK

Bridge Plugin Implementation:

export const SlackBridgePlugin: Plugin = async (ctx) => {
  const opencode = createOpencodeClient();
  
  ws.on('message', async (data) => {
    const msg = JSON.parse(data.toString());
    
    if (msg.type === 'session.create') {
      const session = await opencode.session.create({
        directory: msg.workspace,
        title: msg.task
      });
      
      // Start session with initial message
      await opencode.session.prompt({
        sessionID: session.data.id,
        parts: [{ type: 'text', text: msg.task }],
        agent: msg.agent || 'builder'
      });
      
      ws.send(JSON.stringify({
        type: 'session.created',
        request_id: msg.request_id,
        session_id: session.data.id
      }));
    }
  });
};

Replaces: Earlier Decision 10 which assumed CLI-based integration. SDK approach is cleaner and more maintainable.

Updated Component Responsibilities

Bridge Plugin (TypeScript):

✅ Single source of truth for OpenCode integration
✅ WebSocket client to Gateway
✅ Request/response for session CRUD via OpenCode SDK
✅ Event streaming for real-time updates
✅ Permission handling (HTTP for backwards compatibility)
✅ Loaded from PVC workspace (hot-reload capable)

Gateway (Python FastAPI):

✅ Slack integration (Socket Mode)
✅ State management (session ↔ thread mapping)
✅ Event formatting (OpenCode events → Slack UI)
✅ WebSocket server for Bridge connection
✅ State persistence to PVC-mounted git repo

OpenCode Core:

✅ Session state persisted to PVC
✅ Auth tokens persisted to PVC
✅ Accessible via SDK from Bridge Plugin
✅ Sessions visible in web UI

MVP End-to-End Flow (Updated)

1. Slack: /opencode start "Build auth"
   ↓
2. Gateway: Receives via Socket Mode
   ↓
3. Gateway → Bridge WS: { type: 'session.create', task: 'Build auth' }
   ↓
4. Bridge: client.session.create() via SDK
   ↓
5. Bridge: client.session.prompt() to send initial message
   ↓
6. Bridge → Gateway WS: { type: 'session.created', session_id: 'ses_123' }
   ↓
7. Gateway: Maps session_id ↔ slack_thread_ts
   ↓
8. OpenCode: Processes with BMAD agent
   ↓
9. OpenCode Events: Streamed to Bridge via event system
   ↓
10. Bridge → Gateway WS: { type: 'message.part.streamed', content: '...' }
    ↓
11. Gateway → Slack: Updates thread with agent output

State Persistence Throughout:

OpenCode state → PVC (.opencode-data)
Session state → PVC (ai-dev-state repo)
Plugin code → PVC (.opencode-plugins)

Decision 15: UI-Initiated Session Configuration Collection

Date Added: 2026-02-02
Context: Session creation flow for sessions started in OpenCode UI (not via Slack)

Decision: Lazy, Orthogonal Configuration Collection with Optional Bundling

Problem:

When users create sessions directly in OpenCode UI, we need two distinct types of configuration:

Builder Configuration (workspace management):
- Project ID (from .ai/projectlist.md)
- Workspace name
- Repositories
- Category
Slack Configuration (notification routing):
- Channel
- Priority
- Notification level

These configs are needed at different times and can be collected in different UIs depending on user presence.

Architecture:

Two Independent, Lazy-Collected Configurations:

# Session starts with minimal state
session = {
    "id": "ses_123",
    "title": "Add rate limiting to auth API",
    "directory": "/workspace/repos/domain-apis",
    "type": "exploratory",  # Starts as exploratory
    "builder_config": None,  # Collected when needed
    "slack_config": None     # Collected when needed
}

Collection Timing:

Builder Config - Collected when first write operation attempted:
- Triggered by: mcp_edit, mcp_write, or write-related bash commands
- Collection UI: OpenCode modal if user present in web UI, Slack form if user in Slack
- Required fields: Project ID, workspace name, repositories, category
- Action: Initializes builder workspace via task builder:init BUILDER_NAME={workspace_name} REPOS={repos}
Slack Config - Collected when attention leaves OpenCode:
- Triggered by: First notification when user not in OpenCode web UI (presence detection)
- Collection UI: Always Slack (DM or existing thread)
- Required fields: Channel, priority
- Action: Creates Slack thread, establishes notification routing

Configuration Collection Implementation:

class SessionConfigManager:
    async def ensure_builder_config(self, session_id):
        """Ensure builder config exists, prompt if needed"""
        session = await get_session(session_id)
        
        if session.builder_config:
            return session.builder_config
        
        # Infer smart defaults from context
        defaults = {
            "project_id": infer_project_id(session),  # Match to projectlist.md
            "workspace_name": f"{project_id}-{slugify(session.title)}",
            "repositories": infer_repos(session.directory),
            "category": infer_category(session)
        }
        
        # Collect from appropriate UI
        presence = await get_user_presence()
        
        if presence.active_in_opencode_ui:
            # User in OpenCode - prompt there
            config = await prompt_opencode_modal(
                title="Initialize Builder Workspace",
                message="Making changes requires a builder workspace:",
                fields=defaults,
                optional_section={
                    "enable_slack": False,
                    "channel": infer_channel(defaults["category"]),
                    "priority": "medium"
                }
            )
        else:
            # User in Slack or unknown - prompt in Slack
            config = await prompt_slack_form(
                title="🏗️ Initialize Builder Workspace",
                message=f"To make changes to '{session.title}', I need to set up a workspace:",
                fields=defaults,
                optional_section={
                    "enable_slack": True,  # Default YES in Slack
                    "channel": infer_channel(defaults["category"]),
                    "priority": "medium"
                }
            )
        
        # Initialize builder workspace
        workspace_path = await init_builder(
            builder_name=config.workspace_name,
            repos=config.repositories
        )
        
        # Update session
        await update_session(session_id, {
            "builder_config": {**config, "workspace_path": workspace_path},
            "type": "work"
        })
        
        # If user opted in to Slack config, set that up too
        if config.get("enable_slack"):
            await setup_slack_config(session_id, config)
        
        return config
    
    async def ensure_slack_config(self, session_id):
        """Ensure Slack config exists, prompt if needed"""
        session = await get_session(session_id)
        
        if session.slack_config:
            return session.slack_config
        
        # Infer smart defaults
        channel = infer_channel_from_category(
            session.builder_config?.category if session.builder_config else None
        )
        
        # Always collect via Slack (since we're routing there)
        config = await prompt_slack_form(
            target="DM",
            title="📬 Setup Notifications",
            message=f"Where should I post updates for '{session.title}'?",
            fields={
                "channel": channel,
                "priority": "medium",
                "notification_level": "milestones"
            }
        )
        
        # Create thread in chosen channel
        thread_ts = await create_slack_thread(
            channel=config.channel,
            session=session
        )
        
        # Update session
        await update_session(session_id, {
            "slack_config": {**config, "thread_ts": thread_ts}
        })
        
        return config

Project ID Inference:

def infer_project_id(session):
    """Match session to project in .ai/projectlist.md"""
    project_list = load_project_list()
    
    # Strategy 1: Session title matches project title
    for project in project_list:
        if title_similarity(session.title, project.title) > 0.8:
            return project.id
    
    # Strategy 2: Detected repos match project repos
    detected_repos = detect_git_repos(session.directory)
    for project in project_list:
        if project_uses_repos(project, detected_repos):
            return project.id
    
    # Strategy 3: Category has single active project
    category = infer_category(session.directory)
    matching = [p for p in project_list if p.category == category and p.status == "implementing"]
    if len(matching) == 1:
        return matching[0].id
    
    # Must prompt user
    return None

Optional Bundling (Shortcut):

When prompting for one config, offer optional section to collect the other:

In OpenCode Modal: “Optional: Setup Slack notifications now?” (unchecked by default)
In Slack Form: “Optional: Setup Slack notifications?” (checked by default since user is already in Slack)

This reduces interruptions - user can provide both configs at once if they want.

State Transitions:

Session States:
1. (exploratory, no_builder, no_slack)
   → Reading code, researching, Q&A
   → All work in OpenCode web UI

2. (work, builder_init, no_slack)
   → Making code changes
   → Work in OpenCode web UI
   → Notifications in OpenCode UI

3. (exploratory, no_builder, slack_configured)
   → Reading code, researching
   → User went offline
   → Notifications in Slack

4. (work, builder_init, slack_configured)
   → Making code changes
   → User went offline OR opted in early
   → Notifications in Slack

All Transition Paths:

Path A: UI → Build (online)
  Create session → Attempt write → Prompt builder (OpenCode) → Continue

Path B: UI → Offline → Notify
  Create session → User offline → Event needs attention → Prompt Slack → Thread created

Path C: UI → Build → Offline
  Create session → Attempt write → Prompt builder (OpenCode) → User offline → Prompt Slack

Path D: UI → Offline → Build in Slack
  Create session → User offline → Prompt Slack → Attempt write → Prompt builder (Slack)

Path E: Slack-initiated
  /opencode start → Collect both configs in one form → Builder + Slack ready

Benefits:

✅ Minimal friction - Only prompt when actually needed
✅ Natural timing - Collection happens at logical trigger points
✅ Smart inference - Pre-fill forms with detected values
✅ Flexibility - Can collect in either UI depending on user presence
✅ Optional bundling - User can provide both configs at once to avoid future interruptions
✅ Exploratory sessions stay lightweight - No config needed for read-only work
✅ Consistent with project numbering - Builder names use project IDs from .ai/projectlist.md

Workspace Naming Convention:

Builder workspaces use format: .builders/{project-id}-{task-slug}/

Examples:

.builders/0012-add-rate-limiting/ - Project 0012 (OpenCode Slack Integration)
.builders/0013-notification-system/ - Project 0013 (Grid Exit Strategy)

This aligns with existing project numbering in .ai/projectlist.md and spec/plan file naming conventions.

Architecture decisions finalized 2026-01-31.
Kubernetes integration architecture added 2026-01-31.
WebSocket integration and state persistence added 2026-02-01.
UI-initiated session configuration added 2026-02-02.

Techcle Wiki

Explorer

architecture

Architecture Decision Document: OpenCode Slack Integration

Document Overview

Input Documents Loaded

Project Context Analysis

Requirements Overview

Technical Constraints & Dependencies

Cross-Cutting Concerns Identified

Architectural Decisions Required

Architectural Decisions

Decision 1: Component Architecture

Decision 2: Model Selection Algorithm

Decision 3: Model Fallback Strategy

Decision 4: Session Ownership Model

Decision 5: Question Timeout Git Strategy

Decision 6: Deployment Promotion Workflow

Decision 7: Slack Threading Strategy

Decision 8: State Commit Frequency

Decision 9: Namespace Lifecycle Policy

Decision 10: OpenCode Bridge Implementation

Implementation Strategy

Steel Thread: OpenCode Questions → Slack

Summary

Kubernetes Integration Architecture

Component Deployment Model

Shared Storage Architecture

Secrets Management

Networking

Deployment Orchestration

Build Strategy

State Persistence

Observability Integration

WebSocket Integration Architecture

Architectural Refinement: WebSocket Bidirectional Communication

Decision 11: Bridge-Gateway Communication Protocol

Decision 12: State Persistence Strategy

Decision 13: Plugin Loading from PVC

Decision 14: OpenCode SDK Session Management

Updated Component Responsibilities

MVP End-to-End Flow (Updated)

Decision 15: UI-Initiated Session Configuration Collection

Graph View

Table of Contents