Code Review Follow-ups - OpenCode Slack Integration
PR: #1 - OpenCode Slack Integration - Gateway + Bridge Plugin
Reviewed: 2026-02-01
Reviewer: Dev Agent (Amelia)
Status: PR Approved - Follow-ups tracked below
Summary
PR #1 implements a high-quality OpenCode Slack Integration with:
- ✅ 97% test coverage (target: 80%)
- ✅ Comprehensive error handling
- ✅ Clean architecture with proper separation of concerns
- ✅ Complete documentation (README, TESTING, inline comments)
- ✅ CI/CD workflows for both components
Issues Found: 3 Medium, 2 Low
Critical Issues: 0
🟡 Medium Priority Issues
M1: Race condition in double-click handling
File: services/gateway/services/slack_app.py:64-66
Problem:
Uses pop() to prevent double-click processing, but has TOCTOU (time-of-check-time-of-use) gap. If two clicks arrive simultaneously before first pop() completes, both threads could get the future and process the click.
Current Code:
# Line 64-66
future = self._response_futures.pop(question_id, None)
if future is None:
logger.warning(f"Question {question_id} already answered or doesn't exist")
returnImpact:
- Race window: ~1-10ms between clicks
- Could result in duplicate responses or future resolution errors
- Low probability but possible under high load or mobile network conditions
Suggested Fix:
import asyncio
# Add lock to class __init__
self._futures_lock = asyncio.Lock()
# In _handle_permission_response:
async with self._futures_lock:
future = self._response_futures.pop(question_id, None)
if future is None:
logger.warning(f"Question {question_id} already answered")
return
# Continue with future resolutionPriority: Medium
Effort: 30 minutes
M2: No request validation in permission endpoint
File: services/gateway/api/opencode.py:36-42
Problem:
No validation that session_id is actually an active OpenCode session. Bridge plugin could send arbitrary session IDs, causing spam to Slack for non-existent sessions.
Current Code:
@router.post("/opencode/permission", response_model=PermissionResponse)
async def handle_permission_request(request: PermissionRequest):
# No session_id validation
question = await question_store.add(...)Impact:
- Potential for spam/abuse if bridge is misconfigured
- Confusing Slack messages for invalid sessions
- No way to distinguish legitimate vs spurious requests
Suggested Fix:
# Add basic validation
if not request.session_id or len(request.session_id) < 8:
raise HTTPException(
status_code=400,
detail="Invalid session_id format"
)
# Optional: Validate against active OpenCode sessions
# (requires OpenCode API integration)Alternative: Document that session validation is OpenCode’s responsibility, add monitoring to detect patterns.
Priority: Medium
Effort: 1 hour (basic validation) or 4 hours (full session verification)
M3: Hardcoded timeout values inconsistent
Files:
plugins/opencode-bridge/src/handlers/permission.ts:47- 310000ms (5min 10sec)services/gateway/api/opencode.py:62- 300.0s (5min)services/gateway/services/slack_app.py:115- 300.0s default
Problem:
Bridge has 10-second buffer hardcoded. If Gateway timeout changes, Bridge breaks with timeout errors. Tight coupling via magic numbers.
Current Code:
// Bridge plugin
timeout: 310000, // 5 minutes + 10 seconds buffer# Gateway
decision = await slack_app.slack_gateway.ask_permission(
question=question,
timeout=300.0 # 5 minutes
)Impact:
- Maintenance burden: must update 3 files to change timeout
- Risk of desync causing mysterious timeout errors
- No flexibility per-session or per-permission-type
Suggested Fix:
Option A: Environment Variables
// Bridge
const GATEWAY_TIMEOUT = parseInt(process.env.GATEWAY_TIMEOUT || '300000');
const REQUEST_TIMEOUT = GATEWAY_TIMEOUT + 10000;# Gateway
from services.gateway.config.settings import get_settings
timeout = get_settings().permission_timeout # from envOption B: Dynamic from Gateway
# Gateway returns timeout in response headers
response.headers["X-Timeout-Seconds"] = "300"Priority: Medium
Effort: 2 hours
🟢 Low Priority Issues
L1: Missing structured logging in plugin
File: plugins/opencode-bridge/src/plugin.ts:16-21
Issue:
Uses file-based logging (appendFileSync) instead of structured logging. Makes debugging harder in production environments.
Current Approach:
function log(message: string, ...args: any[]) {
const timestamp = new Date().toISOString();
const logMessage = `${timestamp} ${message} ${args.map(a => JSON.stringify(a)).join(' ')}\n`;
appendFileSync(LOG_FILE, logMessage);
}Recommendation:
Consider using pino or similar for structured JSON logs:
import pino from 'pino';
const logger = pino({ name: 'opencode-bridge' });
logger.info({ sessionId, permissionId }, 'Permission request received');Benefits:
- Easier log aggregation (ELK, CloudWatch, etc.)
- Better filtering and search
- Correlation IDs for tracing
Priority: Low
Effort: 2 hours
L2: No metrics/observability
Files: Gateway and Plugin (all components)
Issue:
No counters for operational visibility:
- Permission requests sent/received
- Timeouts vs successful responses
- Allow vs Deny rates
- Request latency (p50, p95, p99)
Recommendation:
Add basic Prometheus metrics or StatsD:
# Gateway
from prometheus_client import Counter, Histogram
permission_requests = Counter('permission_requests_total', 'Total permission requests', ['decision'])
permission_latency = Histogram('permission_latency_seconds', 'Permission request latency')// Bridge
import StatsD from 'hot-shots';
const stats = new StatsD();
stats.increment('permission.request');
stats.timing('permission.latency', duration);Priority: Low (but important for production)
Effort: 4 hours
✅ What’s Great
This implementation demonstrates excellent engineering:
- Test Coverage: 97% - Well above 80% threshold, comprehensive edge cases
- Clean Architecture: V2 SDK isolated in
v2-client.ts- brilliant refactoring for testability - Error Handling: Comprehensive try/catch with proper fail-closed security posture
- Type Safety: Strong TypeScript types, Pydantic models validate at runtime
- Documentation: README, TESTING.md, inline comments all present and helpful
- CI/CD: GitHub Actions workflows test both components independently
- Production Ready: Environment variable configuration, graceful degradation
Next Steps
Immediate (before production):
- M1: Fix race condition (30 min)
- M2: Add request validation (1 hour)
- M3: Make timeouts configurable (2 hours)
Post-MVP (nice to have):
- L1: Structured logging (2 hours)
- L2: Metrics/observability (4 hours)
Total effort for Medium issues: ~3.5 hours
Review Outcome
Status: ✅ APPROVED WITH FOLLOW-UPS
PR #1 is production-ready with the understanding that M1-M3 should be addressed before high-load scenarios. The implementation is solid, well-tested, and follows best practices.
The Medium issues are primarily about production hardening and operational excellence - they don’t block the initial deployment to a controlled lab environment.
Recommended merge strategy:
- Merge PR #1 to main
- Create follow-up PR addressing M1-M3
- Add L1-L2 to backlog for future sprint
Reviewed by: Dev Agent (Amelia)
Date: 2026-02-01
Coverage: 97.4% statements, 100% branches, 97.33% lines, 100% functions