Market-Making System: Deep Review
Date: 2026-01-31 Status: Notification MVP Complete, Exit State Engine Pending
Executive Summary
The market-making system has a solid foundation with hourly regime detection and basic notifications working. The core architecture is sound, but the Grid Exit Strategy (the main value proposition) is only partially implemented.
Current Status
- ✅ Working: Regime detection, notifications for disabled grids
- ⚠️ Needs Fix: Notification logic still has issues (see below)
- ❌ Missing: Complete Exit State Engine implementation
System Architecture Overview
Components in Production
┌─────────────────────────────────────────────────────────────┐
│ METRICS COLLECTION │
│ (CronJob: Every hour at :01) │
│ │
│ 1. Fetch OHLCV from KuCoin API │
│ 2. Run Regime Engine → Classify market │
│ 3. Analyze Grid Configuration │
│ 4. Calculate Risk Assessment │
│ 5. Store to Git (market-maker-data repo) │
│ 6. Send Notifications via Pushover │
└─────────────────────────────────────────────────────────────┘
Data Flow
KuCoin API
↓
Regime Engine (src/regime/engine.py)
├→ Classifier (4 regimes: RANGE_OK, RANGE_WEAK, TRANSITION, TREND)
├→ Range Analysis (Bollinger Bands, support/resistance)
├→ Mean Reversion Metrics (OU process, Z-scores)
├→ Volatility Metrics (ATR, expansion ratios)
└→ Trend Detection (swing highs/lows, EMA crossovers)
↓
Risk Assessment (src/metrics/history.py)
├→ Grid Configuration Analysis
├→ Position Health (price vs grid bounds)
├→ Stop-Loss Risk
└→ Recommendations Generation
↓
Git Storage (market-maker-data repo)
↓
Notification System (send_regime_notifications.py)
└→ Pushover API
What’s Working ✅
1. Hourly Regime Detection
Location: metrics-service/src/regime/
Capabilities:
- Fetches 1h and 4h OHLCV data from KuCoin
- Classifies into 4 regimes: RANGE_OK, RANGE_WEAK, TRANSITION, TREND
- Calculates 15+ metrics per run including:
- Bollinger Band analysis
- Mean reversion strength (OU half-life, Z-scores)
- Volatility metrics (ATR, expansion ratios)
- Trend indicators (swing structure, EMA crossovers)
- Range confidence scores
Output: Stored to market-maker-data/metrics/YYYY/MM/DD/HH_ETH-USDT.yaml
2. Grid Configuration Management
Location: metrics-service/src/grid/
Current State:
- Grid
eth-v3configured but DISABLED - Price range: 307,000 - 318,500 USDT
- 6 grid levels, evenly spaced
- Stop-loss at 299,000 USDT
- Capital allocated: 0 (grid not active)
3. Git-Backed Storage
Location: market-maker-data repo
Benefits:
- Full audit trail of all regime analyses
- Rate limiting history persists across pod restarts
- Can replay/backtest strategies from historical data
- No database required
4. Pushover Notifications
Location: send_regime_notifications.py
Features:
- Sends alerts when regime state changes
- Rate limiting (4h minimum between same-state notifications)
- Different messages for DISABLED vs ACTIVE grids
- Contains regime verdict, confidence, recommendations
What’s Broken/Incomplete ⚠️
1. Notification Recommendations Logic
Problem: Even with our recent fix, there are still issues.
Current Behavior:
# When grid is DISABLED and regime is RANGE_WEAK:
Notification says:
Title: "⏳ ENTRY OPPORTUNITY: ETH Grid"
Recommendations:
- "Low regime confidence (0.40) - wait for stronger signal"
- "Weak ranging conditions - monitor before entry"Issues:
- The
generate_entry_recommendations()function we added is good BUT - It’s not actually being called because of the way we structured the code
- The risk assessment in
history.pystill generates recommendations that leak through
Root Cause:
Line 283 in send_regime_notifications.py:
recommendations = regime.get("risk_assessment", {}).get("recommendations", [])This pulls recommendations from the risk assessment BEFORE our entry-specific logic runs. So we’re still getting mixed messages.
Fix Needed:
# Should be:
if grid_status == "DISABLED":
recommendations = generate_entry_recommendations(verdict, confidence, regime)
else:
recommendations = regime.get("risk_assessment", {}).get("recommendations", [])2. Exit State Engine - NOT IMPLEMENTED
What’s Missing: The entire exit state classification from the requirements doc.
Required States (from new-instructions.md):
NORMAL- Grid safe to operateWARNING- Risk increasing; prepare to exit (2+ warning triggers)LATEST_ACCEPTABLE_EXIT- Grid assumptions failing (soft stop)MANDATORY_EXIT- MUST stop immediately (hard stop)
Current State: ❌ None of this exists
Where It Should Live:
- New module:
src/exit_strategy/state_engine.py - Consumes: Regime analysis + Grid analysis
- Produces: Exit state classification
- Used by: Notification system
Trigger Logic Needed:
| Exit State | Triggers |
|---|---|
| MANDATORY_EXIT | • Regime = TREND • Range invalidated • 2 consecutive closes outside bounds • Stop-loss breached |
| LATEST_ACCEPTABLE_EXIT | • TRANSITION persists ≥2 4h bars OR ≥4 1h bars • OU half-life ≥ 2× baseline • Volatility expansion > 1.25× |
| WARNING | 2+ of: • TRANSITION probability ≥ 40% • Confidence declining • Efficiency Ratio rising • Mean reversion slowing |
| NORMAL | None of the above |
3. Position Risk Quantification - PARTIALLY IMPLEMENTED
What Exists:
- Grid position health (ABOVE_GRID, IN_GRID, BELOW_GRID)
- Stop-loss proximity tracking
- Basic risk level (LOW, MEDIUM, HIGH)
What’s Missing:
- Actual PnL calculation from KuCoin API
- Capital-at-risk quantification
- Profit give-back estimates
- Real-time position monitoring (currently only checks config, not actual orders)
Why It Matters: The requirements doc says notifications should include:
- “Expected profit give-back if delayed”
- “Capital at risk if delayed”
We can’t provide this without real position data.
4. Evaluation Cadence - WRONG
Current: Hourly (at :01 past the hour) Required: Every 15 minutes
Impact:
- Could miss rapid regime transitions
- 45-minute blind spot between evaluations
- Doesn’t meet “MANDATORY_EXIT → Immediately” requirement
Fix: Change CronJob from 1 * * * * to 1,16,31,46 * * * *
5. Audit Logging - MISSING
Required (from requirements):
- Log all exit signals
- Log operator responses (or lack thereof)
- Track response time vs recommended window
- Enable retrospective analysis of signal quality
Current State: ❌ None of this exists
Code Quality Assessment
Strengths 💪
-
Well-Structured Modules
- Clean separation: regime / grid / metrics / exchanges
- Interface abstractions for testability
- Type hints throughout
-
Regime Engine is Solid
- Multiple complementary indicators
- 4h structural confirmation
- Confidence scoring
- Handles edge cases (low liquidity, data gaps)
-
Configuration Management
- Grid configs are declarative YAML
- Environment-based credential management
- External secrets integration
-
Git Storage Pattern
- Elegant solution for state persistence
- Built-in audit trail
- No database operational overhead
Weaknesses 🐛
-
No Tests
- Zero unit tests in
metrics-service/tests/ - No integration tests
- Can’t refactor with confidence
- Zero unit tests in
-
Mixed Responsibilities
history.pydoes BOTH metrics collection AND risk assessment- Should be split into separate concerns
-
Hard-Coded Logic
- Magic numbers throughout (e.g.,
0.40confidence threshold) - Should be configurable
- Magic numbers throughout (e.g.,
-
Error Handling
- Limited retry logic for API failures
- No circuit breakers
- Git push failures could lose metrics
-
Notification Logic Scattered
- Some in
send_regime_notifications.py - Some in
generate_entry_recommendations() - Some leaking from
history.pyrisk assessment - Needs consolidation
- Some in
Current Metrics Output Sample
Latest: 2026-01-31 07:00 UTC
regime_analysis:
verdict: RANGE_WEAK
confidence: 0.40
primary_regime: RANGE
secondary_signal: WEAKENING
grid_analysis:
risk_metrics:
position_health: BELOW_GRID # Price: 268,011 vs Range: 307K-318.5K
stop_loss_risk: SAFE # Stop at 299K, well below current price
capital_utilization: 0.0 # Grid disabled
risk_assessment:
risk_level: LOW
recommendations:
- "Monitor mean reversion degradation"
- "TRANSITION probability rising (40%)"
account_summary:
total_usdt: 769.02
available_usdt: 769.02
locked_usdt: 0.0 # No active positionsObservation: Price is 38K USDT below the configured grid range. This grid was probably profitable before and got stopped out. Now the system is waiting for re-entry.
Gap Analysis
Critical Path to MVP
To fulfill the requirements in new-instructions.md, we need:
| Component | Status | Effort | Priority |
|---|---|---|---|
| Fix notification recommendations | ⚠️ Partial | 1h | P0 |
| Exit State Engine | ❌ Missing | 8-12h | P0 |
| Mandatory Exit triggers | ❌ Missing | 4h | P0 |
| Latest Acceptable Exit triggers | ❌ Missing | 4h | P0 |
| Warning triggers | ❌ Missing | 2h | P1 |
| Position Risk quantification | ⚠️ Partial | 6h | P1 |
| 15-min evaluation cadence | ❌ Missing | 0.5h | P1 |
| Audit logging | ❌ Missing | 3h | P2 |
| KPI tracking | ❌ Missing | 4h | P2 |
| TOTAL | 32-37h |
Non-MVP Enhancements
- Multi-symbol support (currently ETH/USDT only)
- Backtesting framework
- Performance dashboard
- Automatic grid parameter tuning
- Multi-exchange support (currently KuCoin only)
Architectural Recommendations
1. Implement Exit State Engine First
Why: This is the core value proposition. Everything else is supporting infrastructure.
Approach:
# src/exit_strategy/state_engine.py
class ExitStateEngine:
def evaluate(self, regime: Dict, grid: Dict) -> ExitState:
"""
Main entry point. Returns one of:
- NORMAL
- WARNING
- LATEST_ACCEPTABLE_EXIT
- MANDATORY_EXIT
"""
if self._check_mandatory_exit(regime, grid):
return ExitState.MANDATORY_EXIT
elif self._check_latest_acceptable_exit(regime, grid):
return ExitState.LATEST_ACCEPTABLE_EXIT
elif self._check_warning(regime, grid):
return ExitState.WARNING
else:
return ExitState.NORMAL2. Refactor Notification System
Current: Monolithic script with mixed concerns Proposed:
send_regime_notifications.py (orchestrator)
├→ src/exit_strategy/state_engine.py (exit state classification)
├→ src/exit_strategy/message_builder.py (notification content)
└→ src/exit_strategy/pushover_client.py (delivery)
3. Add Position Tracker
Purpose: Get real position data from exchange API, not just config
# src/grid/position_tracker.py
class PositionTracker:
def get_active_positions(self, grid_id: str) -> List[Position]:
"""Fetch actual open orders from KuCoin API"""
def calculate_pnl(self, positions: List[Position]) -> PnLSummary:
"""Calculate unrealized PnL"""
def estimate_profit_giveback(self, exit_price: float) -> float:
"""If we exit now, how much profit do we give back?"""4. Separate Risk Assessment from Metrics Collection
Current: metrics/history.py does both
Proposed:
src/
metrics/
collector.py # Fetch & store metrics
risk/
assessor.py # Analyze metrics → risk level
exit_strategy/
state_engine.py # Risk + regime → exit state
Testing Strategy
Phase 1: Unit Tests
- Exit state trigger logic
- Notification message building
- Position PnL calculations
Phase 2: Integration Tests
- Full regime analysis → exit state flow
- Notification delivery (use test Pushover credentials)
- Git storage round-trip
Phase 3: Backtesting
- Replay historical metrics
- Evaluate exit signal quality
- Measure would-be profit preservation
Deployment Considerations
Current Deployment
- Image:
ghcr.io/craigedmunds/market-making/metrics-service:0.1.15 - Schedule: CronJob at
:01every hour - Namespace:
market-making - Secrets: KuCoin API keys via ExternalSecrets
- Storage: Git-backed (market-maker-data repo)
Changes Needed for MVP
- Update CronJob schedule to 15-min intervals
- Add audit log storage (new Git directory or separate repo)
- Consider separate CronJob for exit evaluation vs metrics collection
- Metrics: Hourly (heavy API usage)
- Exit evaluation: Every 15min (reads from Git)
Risk Assessment
Technical Risks
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| False MANDATORY_EXIT signals | High | Medium | Require 2+ confirming indicators |
| Missed regime transitions | High | Low | 15-min cadence + multi-timeframe |
| API rate limiting | Medium | Low | Cache data, backoff strategy |
| Git push failures | Medium | Low | Retry logic, local backup |
Operational Risks
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Operator misses notification | High | Medium | Multi-channel delivery |
| Notification fatigue | Medium | High | Smart rate limiting by exit state |
| Grid stopped unnecessarily | Medium | Medium | Backtesting + tuning |
Next Steps
Immediate (This Session)
- ✅ Fix notification recommendation logic (1h)
- Implement basic Exit State Engine (4h)
- Add MANDATORY_EXIT triggers (2h)
- Update notification messages per exit state (1h)
Short-term (Next Session)
- Add LATEST_ACCEPTABLE_EXIT triggers
- Add WARNING triggers
- Implement audit logging
- Change to 15-min cadence
- Add unit tests
Medium-term (Future)
- Position Risk quantification with real API data
- KPI tracking framework
- Backtesting capability
- Performance dashboard
Conclusion
The market-making system has a strong technical foundation but is only 40% complete toward the MVP requirements:
- ✅ Regime detection works well
- ✅ Grid configuration management solid
- ✅ Git storage pattern elegant
- ⚠️ Notifications work but have bugs
- ❌ Exit State Engine missing (core value)
- ❌ Position tracking incomplete
- ❌ Audit logging missing
Estimated time to MVP: 32-37 hours of focused development.
Recommendation: Implement Exit State Engine immediately. This is the differentiator that makes the system valuable.