Code Review Follow-ups - OpenCode Slack Integration

PR: #1 - OpenCode Slack Integration - Gateway + Bridge Plugin
Reviewed: 2026-02-01
Reviewer: Dev Agent (Amelia)
Status: PR Approved - Follow-ups tracked below


Summary

PR #1 implements a high-quality OpenCode Slack Integration with:

  • ✅ 97% test coverage (target: 80%)
  • ✅ Comprehensive error handling
  • ✅ Clean architecture with proper separation of concerns
  • ✅ Complete documentation (README, TESTING, inline comments)
  • ✅ CI/CD workflows for both components

Issues Found: 3 Medium, 2 Low
Critical Issues: 0


🟡 Medium Priority Issues

M1: Race condition in double-click handling

File: services/gateway/services/slack_app.py:64-66

Problem:
Uses pop() to prevent double-click processing, but has TOCTOU (time-of-check-time-of-use) gap. If two clicks arrive simultaneously before first pop() completes, both threads could get the future and process the click.

Current Code:

# Line 64-66
future = self._response_futures.pop(question_id, None)
if future is None:
    logger.warning(f"Question {question_id} already answered or doesn't exist")
    return

Impact:

  • Race window: ~1-10ms between clicks
  • Could result in duplicate responses or future resolution errors
  • Low probability but possible under high load or mobile network conditions

Suggested Fix:

import asyncio
 
# Add lock to class __init__
self._futures_lock = asyncio.Lock()
 
# In _handle_permission_response:
async with self._futures_lock:
    future = self._response_futures.pop(question_id, None)
    if future is None:
        logger.warning(f"Question {question_id} already answered")
        return
    # Continue with future resolution

Priority: Medium
Effort: 30 minutes


M2: No request validation in permission endpoint

File: services/gateway/api/opencode.py:36-42

Problem:
No validation that session_id is actually an active OpenCode session. Bridge plugin could send arbitrary session IDs, causing spam to Slack for non-existent sessions.

Current Code:

@router.post("/opencode/permission", response_model=PermissionResponse)
async def handle_permission_request(request: PermissionRequest):
    # No session_id validation
    question = await question_store.add(...)

Impact:

  • Potential for spam/abuse if bridge is misconfigured
  • Confusing Slack messages for invalid sessions
  • No way to distinguish legitimate vs spurious requests

Suggested Fix:

# Add basic validation
if not request.session_id or len(request.session_id) < 8:
    raise HTTPException(
        status_code=400,
        detail="Invalid session_id format"
    )
 
# Optional: Validate against active OpenCode sessions
# (requires OpenCode API integration)

Alternative: Document that session validation is OpenCode’s responsibility, add monitoring to detect patterns.

Priority: Medium
Effort: 1 hour (basic validation) or 4 hours (full session verification)


M3: Hardcoded timeout values inconsistent

Files:

  • plugins/opencode-bridge/src/handlers/permission.ts:47 - 310000ms (5min 10sec)
  • services/gateway/api/opencode.py:62 - 300.0s (5min)
  • services/gateway/services/slack_app.py:115 - 300.0s default

Problem:
Bridge has 10-second buffer hardcoded. If Gateway timeout changes, Bridge breaks with timeout errors. Tight coupling via magic numbers.

Current Code:

// Bridge plugin
timeout: 310000, // 5 minutes + 10 seconds buffer
# Gateway
decision = await slack_app.slack_gateway.ask_permission(
    question=question,
    timeout=300.0  # 5 minutes
)

Impact:

  • Maintenance burden: must update 3 files to change timeout
  • Risk of desync causing mysterious timeout errors
  • No flexibility per-session or per-permission-type

Suggested Fix:

Option A: Environment Variables

// Bridge
const GATEWAY_TIMEOUT = parseInt(process.env.GATEWAY_TIMEOUT || '300000');
const REQUEST_TIMEOUT = GATEWAY_TIMEOUT + 10000;
# Gateway
from services.gateway.config.settings import get_settings
timeout = get_settings().permission_timeout  # from env

Option B: Dynamic from Gateway

# Gateway returns timeout in response headers
response.headers["X-Timeout-Seconds"] = "300"

Priority: Medium
Effort: 2 hours


🟢 Low Priority Issues

L1: Missing structured logging in plugin

File: plugins/opencode-bridge/src/plugin.ts:16-21

Issue:
Uses file-based logging (appendFileSync) instead of structured logging. Makes debugging harder in production environments.

Current Approach:

function log(message: string, ...args: any[]) {
  const timestamp = new Date().toISOString();
  const logMessage = `${timestamp} ${message} ${args.map(a => JSON.stringify(a)).join(' ')}\n`;
  appendFileSync(LOG_FILE, logMessage);
}

Recommendation:
Consider using pino or similar for structured JSON logs:

import pino from 'pino';
const logger = pino({ name: 'opencode-bridge' });
 
logger.info({ sessionId, permissionId }, 'Permission request received');

Benefits:

  • Easier log aggregation (ELK, CloudWatch, etc.)
  • Better filtering and search
  • Correlation IDs for tracing

Priority: Low
Effort: 2 hours


L2: No metrics/observability

Files: Gateway and Plugin (all components)

Issue:
No counters for operational visibility:

  • Permission requests sent/received
  • Timeouts vs successful responses
  • Allow vs Deny rates
  • Request latency (p50, p95, p99)

Recommendation:
Add basic Prometheus metrics or StatsD:

# Gateway
from prometheus_client import Counter, Histogram
 
permission_requests = Counter('permission_requests_total', 'Total permission requests', ['decision'])
permission_latency = Histogram('permission_latency_seconds', 'Permission request latency')
// Bridge
import StatsD from 'hot-shots';
const stats = new StatsD();
 
stats.increment('permission.request');
stats.timing('permission.latency', duration);

Priority: Low (but important for production)
Effort: 4 hours


✅ What’s Great

This implementation demonstrates excellent engineering:

  1. Test Coverage: 97% - Well above 80% threshold, comprehensive edge cases
  2. Clean Architecture: V2 SDK isolated in v2-client.ts - brilliant refactoring for testability
  3. Error Handling: Comprehensive try/catch with proper fail-closed security posture
  4. Type Safety: Strong TypeScript types, Pydantic models validate at runtime
  5. Documentation: README, TESTING.md, inline comments all present and helpful
  6. CI/CD: GitHub Actions workflows test both components independently
  7. Production Ready: Environment variable configuration, graceful degradation

Next Steps

Immediate (before production):

  • M1: Fix race condition (30 min)
  • M2: Add request validation (1 hour)
  • M3: Make timeouts configurable (2 hours)

Post-MVP (nice to have):

  • L1: Structured logging (2 hours)
  • L2: Metrics/observability (4 hours)

Total effort for Medium issues: ~3.5 hours


Review Outcome

Status:APPROVED WITH FOLLOW-UPS

PR #1 is production-ready with the understanding that M1-M3 should be addressed before high-load scenarios. The implementation is solid, well-tested, and follows best practices.

The Medium issues are primarily about production hardening and operational excellence - they don’t block the initial deployment to a controlled lab environment.

Recommended merge strategy:

  1. Merge PR #1 to main
  2. Create follow-up PR addressing M1-M3
  3. Add L1-L2 to backlog for future sprint

Reviewed by: Dev Agent (Amelia)
Date: 2026-02-01
Coverage: 97.4% statements, 100% branches, 97.33% lines, 100% functions