Code Review Follow-ups - OpenCode Slack Integration

PR: #1 - OpenCode Slack Integration - Gateway + Bridge Plugin
Reviewed: 2026-02-01
Reviewer: Dev Agent (Amelia)
Status: PR Approved - Follow-ups tracked below

Summary

PR #1 implements a high-quality OpenCode Slack Integration with:

✅ 97% test coverage (target: 80%)
✅ Comprehensive error handling
✅ Clean architecture with proper separation of concerns
✅ Complete documentation (README, TESTING, inline comments)
✅ CI/CD workflows for both components

Issues Found: 3 Medium, 2 Low
Critical Issues: 0

🟡 Medium Priority Issues

M1: Race condition in double-click handling

File: services/gateway/services/slack_app.py:64-66

Problem:
Uses pop() to prevent double-click processing, but has TOCTOU (time-of-check-time-of-use) gap. If two clicks arrive simultaneously before first pop() completes, both threads could get the future and process the click.

Current Code:

# Line 64-66
future = self._response_futures.pop(question_id, None)
if future is None:
    logger.warning(f"Question {question_id} already answered or doesn't exist")
    return

Impact:

Race window: ~1-10ms between clicks
Could result in duplicate responses or future resolution errors
Low probability but possible under high load or mobile network conditions

Suggested Fix:

import asyncio
 
# Add lock to class __init__
self._futures_lock = asyncio.Lock()
 
# In _handle_permission_response:
async with self._futures_lock:
    future = self._response_futures.pop(question_id, None)
    if future is None:
        logger.warning(f"Question {question_id} already answered")
        return
    # Continue with future resolution

Priority: Medium
Effort: 30 minutes

M2: No request validation in permission endpoint

File: services/gateway/api/opencode.py:36-42

Problem:
No validation that session_id is actually an active OpenCode session. Bridge plugin could send arbitrary session IDs, causing spam to Slack for non-existent sessions.

Current Code:

@router.post("/opencode/permission", response_model=PermissionResponse)
async def handle_permission_request(request: PermissionRequest):
    # No session_id validation
    question = await question_store.add(...)

Impact:

Potential for spam/abuse if bridge is misconfigured
Confusing Slack messages for invalid sessions
No way to distinguish legitimate vs spurious requests

Suggested Fix:

# Add basic validation
if not request.session_id or len(request.session_id) < 8:
    raise HTTPException(
        status_code=400,
        detail="Invalid session_id format"
    )
 
# Optional: Validate against active OpenCode sessions
# (requires OpenCode API integration)

Alternative: Document that session validation is OpenCode’s responsibility, add monitoring to detect patterns.

Priority: Medium
Effort: 1 hour (basic validation) or 4 hours (full session verification)

M3: Hardcoded timeout values inconsistent

Files:

plugins/opencode-bridge/src/handlers/permission.ts:47 - 310000ms (5min 10sec)
services/gateway/api/opencode.py:62 - 300.0s (5min)
services/gateway/services/slack_app.py:115 - 300.0s default

Problem:
Bridge has 10-second buffer hardcoded. If Gateway timeout changes, Bridge breaks with timeout errors. Tight coupling via magic numbers.

Current Code:

// Bridge plugin
timeout: 310000, // 5 minutes + 10 seconds buffer

# Gateway
decision = await slack_app.slack_gateway.ask_permission(
    question=question,
    timeout=300.0  # 5 minutes
)

Impact:

Maintenance burden: must update 3 files to change timeout
Risk of desync causing mysterious timeout errors
No flexibility per-session or per-permission-type

Suggested Fix:

Option A: Environment Variables

// Bridge
const GATEWAY_TIMEOUT = parseInt(process.env.GATEWAY_TIMEOUT || '300000');
const REQUEST_TIMEOUT = GATEWAY_TIMEOUT + 10000;

# Gateway
from services.gateway.config.settings import get_settings
timeout = get_settings().permission_timeout  # from env

Option B: Dynamic from Gateway

# Gateway returns timeout in response headers
response.headers["X-Timeout-Seconds"] = "300"

Priority: Medium
Effort: 2 hours

🟢 Low Priority Issues

L1: Missing structured logging in plugin

File: plugins/opencode-bridge/src/plugin.ts:16-21

Issue:
Uses file-based logging (appendFileSync) instead of structured logging. Makes debugging harder in production environments.

Current Approach:

function log(message: string, ...args: any[]) {
  const timestamp = new Date().toISOString();
  const logMessage = `${timestamp} ${message} ${args.map(a => JSON.stringify(a)).join(' ')}\n`;
  appendFileSync(LOG_FILE, logMessage);
}

Recommendation:
Consider using pino or similar for structured JSON logs:

import pino from 'pino';
const logger = pino({ name: 'opencode-bridge' });
 
logger.info({ sessionId, permissionId }, 'Permission request received');

Benefits:

Easier log aggregation (ELK, CloudWatch, etc.)
Better filtering and search
Correlation IDs for tracing

Priority: Low
Effort: 2 hours

L2: No metrics/observability

Files: Gateway and Plugin (all components)

Issue:
No counters for operational visibility:

Permission requests sent/received
Timeouts vs successful responses
Allow vs Deny rates
Request latency (p50, p95, p99)

Recommendation:
Add basic Prometheus metrics or StatsD:

# Gateway
from prometheus_client import Counter, Histogram
 
permission_requests = Counter('permission_requests_total', 'Total permission requests', ['decision'])
permission_latency = Histogram('permission_latency_seconds', 'Permission request latency')

// Bridge
import StatsD from 'hot-shots';
const stats = new StatsD();
 
stats.increment('permission.request');
stats.timing('permission.latency', duration);

Priority: Low (but important for production)
Effort: 4 hours

✅ What’s Great

This implementation demonstrates excellent engineering:

Test Coverage: 97% - Well above 80% threshold, comprehensive edge cases
Clean Architecture: V2 SDK isolated in v2-client.ts - brilliant refactoring for testability
Error Handling: Comprehensive try/catch with proper fail-closed security posture
Type Safety: Strong TypeScript types, Pydantic models validate at runtime
Documentation: README, TESTING.md, inline comments all present and helpful
CI/CD: GitHub Actions workflows test both components independently
Production Ready: Environment variable configuration, graceful degradation

Next Steps

Immediate (before production):

M1: Fix race condition (30 min)
M2: Add request validation (1 hour)
M3: Make timeouts configurable (2 hours)

Post-MVP (nice to have):

L1: Structured logging (2 hours)
L2: Metrics/observability (4 hours)

Total effort for Medium issues: ~3.5 hours

Review Outcome

Status: ✅ APPROVED WITH FOLLOW-UPS

PR #1 is production-ready with the understanding that M1-M3 should be addressed before high-load scenarios. The implementation is solid, well-tested, and follows best practices.

The Medium issues are primarily about production hardening and operational excellence - they don’t block the initial deployment to a controlled lab environment.

Recommended merge strategy:

Merge PR #1 to main
Create follow-up PR addressing M1-M3
Add L1-L2 to backlog for future sprint

Reviewed by: Dev Agent (Amelia)
Date: 2026-02-01
Coverage: 97.4% statements, 100% branches, 97.33% lines, 100% functions

Techcle Wiki

Explorer

CODE REVIEW FOLLOWUPS

Code Review Follow-ups - OpenCode Slack Integration

Summary

🟡 Medium Priority Issues

M1: Race condition in double-click handling

M2: No request validation in permission endpoint

M3: Hardcoded timeout values inconsistent

🟢 Low Priority Issues

L1: Missing structured logging in plugin

L2: No metrics/observability

✅ What’s Great

Next Steps

Review Outcome

Graph View

Table of Contents