Market-Making System: Deep Review

Date: 2026-01-31 Status: Notification MVP Complete, Exit State Engine Pending

Executive Summary

The market-making system has a solid foundation with hourly regime detection and basic notifications working. The core architecture is sound, but the Grid Exit Strategy (the main value proposition) is only partially implemented.

Current Status

  • Working: Regime detection, notifications for disabled grids
  • ⚠️ Needs Fix: Notification logic still has issues (see below)
  • Missing: Complete Exit State Engine implementation

System Architecture Overview

Components in Production

┌─────────────────────────────────────────────────────────────┐
│                    METRICS COLLECTION                        │
│  (CronJob: Every hour at :01)                               │
│                                                              │
│  1. Fetch OHLCV from KuCoin API                             │
│  2. Run Regime Engine → Classify market                     │
│  3. Analyze Grid Configuration                              │
│  4. Calculate Risk Assessment                               │
│  5. Store to Git (market-maker-data repo)                   │
│  6. Send Notifications via Pushover                         │
└─────────────────────────────────────────────────────────────┘

Data Flow

KuCoin API
    ↓
Regime Engine (src/regime/engine.py)
    ├→ Classifier (4 regimes: RANGE_OK, RANGE_WEAK, TRANSITION, TREND)
    ├→ Range Analysis (Bollinger Bands, support/resistance)
    ├→ Mean Reversion Metrics (OU process, Z-scores)
    ├→ Volatility Metrics (ATR, expansion ratios)
    └→ Trend Detection (swing highs/lows, EMA crossovers)
    ↓
Risk Assessment (src/metrics/history.py)
    ├→ Grid Configuration Analysis
    ├→ Position Health (price vs grid bounds)
    ├→ Stop-Loss Risk
    └→ Recommendations Generation
    ↓
Git Storage (market-maker-data repo)
    ↓
Notification System (send_regime_notifications.py)
    └→ Pushover API

What’s Working ✅

1. Hourly Regime Detection

Location: metrics-service/src/regime/

Capabilities:

  • Fetches 1h and 4h OHLCV data from KuCoin
  • Classifies into 4 regimes: RANGE_OK, RANGE_WEAK, TRANSITION, TREND
  • Calculates 15+ metrics per run including:
    • Bollinger Band analysis
    • Mean reversion strength (OU half-life, Z-scores)
    • Volatility metrics (ATR, expansion ratios)
    • Trend indicators (swing structure, EMA crossovers)
    • Range confidence scores

Output: Stored to market-maker-data/metrics/YYYY/MM/DD/HH_ETH-USDT.yaml

2. Grid Configuration Management

Location: metrics-service/src/grid/

Current State:

  • Grid eth-v3 configured but DISABLED
  • Price range: 307,000 - 318,500 USDT
  • 6 grid levels, evenly spaced
  • Stop-loss at 299,000 USDT
  • Capital allocated: 0 (grid not active)

3. Git-Backed Storage

Location: market-maker-data repo

Benefits:

  • Full audit trail of all regime analyses
  • Rate limiting history persists across pod restarts
  • Can replay/backtest strategies from historical data
  • No database required

4. Pushover Notifications

Location: send_regime_notifications.py

Features:

  • Sends alerts when regime state changes
  • Rate limiting (4h minimum between same-state notifications)
  • Different messages for DISABLED vs ACTIVE grids
  • Contains regime verdict, confidence, recommendations

What’s Broken/Incomplete ⚠️

1. Notification Recommendations Logic

Problem: Even with our recent fix, there are still issues.

Current Behavior:

# When grid is DISABLED and regime is RANGE_WEAK:
Notification says:
  Title: "⏳ ENTRY OPPORTUNITY: ETH Grid"
  Recommendations:
    - "Low regime confidence (0.40) - wait for stronger signal"
    - "Weak ranging conditions - monitor before entry"

Issues:

  1. The generate_entry_recommendations() function we added is good BUT
  2. It’s not actually being called because of the way we structured the code
  3. The risk assessment in history.py still generates recommendations that leak through

Root Cause: Line 283 in send_regime_notifications.py:

recommendations = regime.get("risk_assessment", {}).get("recommendations", [])

This pulls recommendations from the risk assessment BEFORE our entry-specific logic runs. So we’re still getting mixed messages.

Fix Needed:

# Should be:
if grid_status == "DISABLED":
    recommendations = generate_entry_recommendations(verdict, confidence, regime)
else:
    recommendations = regime.get("risk_assessment", {}).get("recommendations", [])

2. Exit State Engine - NOT IMPLEMENTED

What’s Missing: The entire exit state classification from the requirements doc.

Required States (from new-instructions.md):

  • NORMAL - Grid safe to operate
  • WARNING - Risk increasing; prepare to exit (2+ warning triggers)
  • LATEST_ACCEPTABLE_EXIT - Grid assumptions failing (soft stop)
  • MANDATORY_EXIT - MUST stop immediately (hard stop)

Current State: ❌ None of this exists

Where It Should Live:

  • New module: src/exit_strategy/state_engine.py
  • Consumes: Regime analysis + Grid analysis
  • Produces: Exit state classification
  • Used by: Notification system

Trigger Logic Needed:

Exit StateTriggers
MANDATORY_EXIT• Regime = TREND
• Range invalidated
• 2 consecutive closes outside bounds
• Stop-loss breached
LATEST_ACCEPTABLE_EXIT• TRANSITION persists ≥2 4h bars OR ≥4 1h bars
• OU half-life ≥ 2× baseline
• Volatility expansion > 1.25×
WARNING2+ of:
• TRANSITION probability ≥ 40%
• Confidence declining
• Efficiency Ratio rising
• Mean reversion slowing
NORMALNone of the above

3. Position Risk Quantification - PARTIALLY IMPLEMENTED

What Exists:

  • Grid position health (ABOVE_GRID, IN_GRID, BELOW_GRID)
  • Stop-loss proximity tracking
  • Basic risk level (LOW, MEDIUM, HIGH)

What’s Missing:

  • Actual PnL calculation from KuCoin API
  • Capital-at-risk quantification
  • Profit give-back estimates
  • Real-time position monitoring (currently only checks config, not actual orders)

Why It Matters: The requirements doc says notifications should include:

  • “Expected profit give-back if delayed”
  • “Capital at risk if delayed”

We can’t provide this without real position data.

4. Evaluation Cadence - WRONG

Current: Hourly (at :01 past the hour) Required: Every 15 minutes

Impact:

  • Could miss rapid regime transitions
  • 45-minute blind spot between evaluations
  • Doesn’t meet “MANDATORY_EXIT → Immediately” requirement

Fix: Change CronJob from 1 * * * * to 1,16,31,46 * * * *

5. Audit Logging - MISSING

Required (from requirements):

  • Log all exit signals
  • Log operator responses (or lack thereof)
  • Track response time vs recommended window
  • Enable retrospective analysis of signal quality

Current State: ❌ None of this exists


Code Quality Assessment

Strengths 💪

  1. Well-Structured Modules

    • Clean separation: regime / grid / metrics / exchanges
    • Interface abstractions for testability
    • Type hints throughout
  2. Regime Engine is Solid

    • Multiple complementary indicators
    • 4h structural confirmation
    • Confidence scoring
    • Handles edge cases (low liquidity, data gaps)
  3. Configuration Management

    • Grid configs are declarative YAML
    • Environment-based credential management
    • External secrets integration
  4. Git Storage Pattern

    • Elegant solution for state persistence
    • Built-in audit trail
    • No database operational overhead

Weaknesses 🐛

  1. No Tests

    • Zero unit tests in metrics-service/tests/
    • No integration tests
    • Can’t refactor with confidence
  2. Mixed Responsibilities

    • history.py does BOTH metrics collection AND risk assessment
    • Should be split into separate concerns
  3. Hard-Coded Logic

    • Magic numbers throughout (e.g., 0.40 confidence threshold)
    • Should be configurable
  4. Error Handling

    • Limited retry logic for API failures
    • No circuit breakers
    • Git push failures could lose metrics
  5. Notification Logic Scattered

    • Some in send_regime_notifications.py
    • Some in generate_entry_recommendations()
    • Some leaking from history.py risk assessment
    • Needs consolidation

Current Metrics Output Sample

Latest: 2026-01-31 07:00 UTC

regime_analysis:
  verdict: RANGE_WEAK
  confidence: 0.40
  primary_regime: RANGE
  secondary_signal: WEAKENING
  
grid_analysis:
  risk_metrics:
    position_health: BELOW_GRID  # Price: 268,011 vs Range: 307K-318.5K
    stop_loss_risk: SAFE         # Stop at 299K, well below current price
    capital_utilization: 0.0     # Grid disabled
    
risk_assessment:
  risk_level: LOW
  recommendations:
    - "Monitor mean reversion degradation"
    - "TRANSITION probability rising (40%)"
    
account_summary:
  total_usdt: 769.02
  available_usdt: 769.02
  locked_usdt: 0.0              # No active positions

Observation: Price is 38K USDT below the configured grid range. This grid was probably profitable before and got stopped out. Now the system is waiting for re-entry.


Gap Analysis

Critical Path to MVP

To fulfill the requirements in new-instructions.md, we need:

ComponentStatusEffortPriority
Fix notification recommendations⚠️ Partial1hP0
Exit State Engine❌ Missing8-12hP0
Mandatory Exit triggers❌ Missing4hP0
Latest Acceptable Exit triggers❌ Missing4hP0
Warning triggers❌ Missing2hP1
Position Risk quantification⚠️ Partial6hP1
15-min evaluation cadence❌ Missing0.5hP1
Audit logging❌ Missing3hP2
KPI tracking❌ Missing4hP2
TOTAL32-37h

Non-MVP Enhancements

  • Multi-symbol support (currently ETH/USDT only)
  • Backtesting framework
  • Performance dashboard
  • Automatic grid parameter tuning
  • Multi-exchange support (currently KuCoin only)

Architectural Recommendations

1. Implement Exit State Engine First

Why: This is the core value proposition. Everything else is supporting infrastructure.

Approach:

# src/exit_strategy/state_engine.py
 
class ExitStateEngine:
    def evaluate(self, regime: Dict, grid: Dict) -> ExitState:
        """
        Main entry point. Returns one of:
        - NORMAL
        - WARNING
        - LATEST_ACCEPTABLE_EXIT
        - MANDATORY_EXIT
        """
        if self._check_mandatory_exit(regime, grid):
            return ExitState.MANDATORY_EXIT
        elif self._check_latest_acceptable_exit(regime, grid):
            return ExitState.LATEST_ACCEPTABLE_EXIT
        elif self._check_warning(regime, grid):
            return ExitState.WARNING
        else:
            return ExitState.NORMAL

2. Refactor Notification System

Current: Monolithic script with mixed concerns Proposed:

send_regime_notifications.py  (orchestrator)
    ├→ src/exit_strategy/state_engine.py  (exit state classification)
    ├→ src/exit_strategy/message_builder.py  (notification content)
    └→ src/exit_strategy/pushover_client.py  (delivery)

3. Add Position Tracker

Purpose: Get real position data from exchange API, not just config

# src/grid/position_tracker.py
 
class PositionTracker:
    def get_active_positions(self, grid_id: str) -> List[Position]:
        """Fetch actual open orders from KuCoin API"""
        
    def calculate_pnl(self, positions: List[Position]) -> PnLSummary:
        """Calculate unrealized PnL"""
        
    def estimate_profit_giveback(self, exit_price: float) -> float:
        """If we exit now, how much profit do we give back?"""

4. Separate Risk Assessment from Metrics Collection

Current: metrics/history.py does both Proposed:

src/
  metrics/
    collector.py           # Fetch & store metrics
  risk/
    assessor.py           # Analyze metrics → risk level
  exit_strategy/
    state_engine.py       # Risk + regime → exit state

Testing Strategy

Phase 1: Unit Tests

  • Exit state trigger logic
  • Notification message building
  • Position PnL calculations

Phase 2: Integration Tests

  • Full regime analysis → exit state flow
  • Notification delivery (use test Pushover credentials)
  • Git storage round-trip

Phase 3: Backtesting

  • Replay historical metrics
  • Evaluate exit signal quality
  • Measure would-be profit preservation

Deployment Considerations

Current Deployment

  • Image: ghcr.io/craigedmunds/market-making/metrics-service:0.1.15
  • Schedule: CronJob at :01 every hour
  • Namespace: market-making
  • Secrets: KuCoin API keys via ExternalSecrets
  • Storage: Git-backed (market-maker-data repo)

Changes Needed for MVP

  1. Update CronJob schedule to 15-min intervals
  2. Add audit log storage (new Git directory or separate repo)
  3. Consider separate CronJob for exit evaluation vs metrics collection
    • Metrics: Hourly (heavy API usage)
    • Exit evaluation: Every 15min (reads from Git)

Risk Assessment

Technical Risks

RiskImpactLikelihoodMitigation
False MANDATORY_EXIT signalsHighMediumRequire 2+ confirming indicators
Missed regime transitionsHighLow15-min cadence + multi-timeframe
API rate limitingMediumLowCache data, backoff strategy
Git push failuresMediumLowRetry logic, local backup

Operational Risks

RiskImpactLikelihoodMitigation
Operator misses notificationHighMediumMulti-channel delivery
Notification fatigueMediumHighSmart rate limiting by exit state
Grid stopped unnecessarilyMediumMediumBacktesting + tuning

Next Steps

Immediate (This Session)

  1. ✅ Fix notification recommendation logic (1h)
  2. Implement basic Exit State Engine (4h)
  3. Add MANDATORY_EXIT triggers (2h)
  4. Update notification messages per exit state (1h)

Short-term (Next Session)

  1. Add LATEST_ACCEPTABLE_EXIT triggers
  2. Add WARNING triggers
  3. Implement audit logging
  4. Change to 15-min cadence
  5. Add unit tests

Medium-term (Future)

  1. Position Risk quantification with real API data
  2. KPI tracking framework
  3. Backtesting capability
  4. Performance dashboard

Conclusion

The market-making system has a strong technical foundation but is only 40% complete toward the MVP requirements:

  • ✅ Regime detection works well
  • ✅ Grid configuration management solid
  • ✅ Git storage pattern elegant
  • ⚠️ Notifications work but have bugs
  • ❌ Exit State Engine missing (core value)
  • ❌ Position tracking incomplete
  • ❌ Audit logging missing

Estimated time to MVP: 32-37 hours of focused development.

Recommendation: Implement Exit State Engine immediately. This is the differentiator that makes the system valuable.