Market-Making System: Deep Review

Date: 2026-01-31 Status: Notification MVP Complete, Exit State Engine Pending

Executive Summary

The market-making system has a solid foundation with hourly regime detection and basic notifications working. The core architecture is sound, but the Grid Exit Strategy (the main value proposition) is only partially implemented.

Current Status

✅ Working: Regime detection, notifications for disabled grids
⚠️ Needs Fix: Notification logic still has issues (see below)
❌ Missing: Complete Exit State Engine implementation

System Architecture Overview

Components in Production

┌─────────────────────────────────────────────────────────────┐
│                    METRICS COLLECTION                        │
│  (CronJob: Every hour at :01)                               │
│                                                              │
│  1. Fetch OHLCV from KuCoin API                             │
│  2. Run Regime Engine → Classify market                     │
│  3. Analyze Grid Configuration                              │
│  4. Calculate Risk Assessment                               │
│  5. Store to Git (market-maker-data repo)                   │
│  6. Send Notifications via Pushover                         │
└─────────────────────────────────────────────────────────────┘

Data Flow

KuCoin API
    ↓
Regime Engine (src/regime/engine.py)
    ├→ Classifier (4 regimes: RANGE_OK, RANGE_WEAK, TRANSITION, TREND)
    ├→ Range Analysis (Bollinger Bands, support/resistance)
    ├→ Mean Reversion Metrics (OU process, Z-scores)
    ├→ Volatility Metrics (ATR, expansion ratios)
    └→ Trend Detection (swing highs/lows, EMA crossovers)
    ↓
Risk Assessment (src/metrics/history.py)
    ├→ Grid Configuration Analysis
    ├→ Position Health (price vs grid bounds)
    ├→ Stop-Loss Risk
    └→ Recommendations Generation
    ↓
Git Storage (market-maker-data repo)
    ↓
Notification System (send_regime_notifications.py)
    └→ Pushover API

What’s Working ✅

1. Hourly Regime Detection

Location: metrics-service/src/regime/

Capabilities:

Fetches 1h and 4h OHLCV data from KuCoin
Classifies into 4 regimes: RANGE_OK, RANGE_WEAK, TRANSITION, TREND
Calculates 15+ metrics per run including:
- Bollinger Band analysis
- Mean reversion strength (OU half-life, Z-scores)
- Volatility metrics (ATR, expansion ratios)
- Trend indicators (swing structure, EMA crossovers)
- Range confidence scores

Output: Stored to market-maker-data/metrics/YYYY/MM/DD/HH_ETH-USDT.yaml

2. Grid Configuration Management

Location: metrics-service/src/grid/

Current State:

Grid eth-v3 configured but DISABLED
Price range: 307,000 - 318,500 USDT
6 grid levels, evenly spaced
Stop-loss at 299,000 USDT
Capital allocated: 0 (grid not active)

3. Git-Backed Storage

Location: market-maker-data repo

Benefits:

Full audit trail of all regime analyses
Rate limiting history persists across pod restarts
Can replay/backtest strategies from historical data
No database required

4. Pushover Notifications

Location: send_regime_notifications.py

Features:

Sends alerts when regime state changes
Rate limiting (4h minimum between same-state notifications)
Different messages for DISABLED vs ACTIVE grids
Contains regime verdict, confidence, recommendations

What’s Broken/Incomplete ⚠️

1. Notification Recommendations Logic

Problem: Even with our recent fix, there are still issues.

Current Behavior:

# When grid is DISABLED and regime is RANGE_WEAK:
Notification says:
  Title: "⏳ ENTRY OPPORTUNITY: ETH Grid"
  Recommendations:
    - "Low regime confidence (0.40) - wait for stronger signal"
    - "Weak ranging conditions - monitor before entry"

Issues:

The generate_entry_recommendations() function we added is good BUT
It’s not actually being called because of the way we structured the code
The risk assessment in history.py still generates recommendations that leak through

Root Cause: Line 283 in send_regime_notifications.py:

recommendations = regime.get("risk_assessment", {}).get("recommendations", [])

This pulls recommendations from the risk assessment BEFORE our entry-specific logic runs. So we’re still getting mixed messages.

Fix Needed:

# Should be:
if grid_status == "DISABLED":
    recommendations = generate_entry_recommendations(verdict, confidence, regime)
else:
    recommendations = regime.get("risk_assessment", {}).get("recommendations", [])

2. Exit State Engine - NOT IMPLEMENTED

What’s Missing: The entire exit state classification from the requirements doc.

Required States (from new-instructions.md):

NORMAL - Grid safe to operate
WARNING - Risk increasing; prepare to exit (2+ warning triggers)
LATEST_ACCEPTABLE_EXIT - Grid assumptions failing (soft stop)
MANDATORY_EXIT - MUST stop immediately (hard stop)

Current State: ❌ None of this exists

Where It Should Live:

New module: src/exit_strategy/state_engine.py
Consumes: Regime analysis + Grid analysis
Produces: Exit state classification
Used by: Notification system

Trigger Logic Needed:

Exit State	Triggers
MANDATORY_EXIT	• Regime = TREND • Range invalidated • 2 consecutive closes outside bounds • Stop-loss breached
LATEST_ACCEPTABLE_EXIT	• TRANSITION persists ≥2 4h bars OR ≥4 1h bars • OU half-life ≥ 2× baseline • Volatility expansion > 1.25×
WARNING	2+ of: • TRANSITION probability ≥ 40% • Confidence declining • Efficiency Ratio rising • Mean reversion slowing
NORMAL	None of the above

3. Position Risk Quantification - PARTIALLY IMPLEMENTED

What Exists:

Grid position health (ABOVE_GRID, IN_GRID, BELOW_GRID)
Stop-loss proximity tracking
Basic risk level (LOW, MEDIUM, HIGH)

What’s Missing:

Actual PnL calculation from KuCoin API
Capital-at-risk quantification
Profit give-back estimates
Real-time position monitoring (currently only checks config, not actual orders)

Why It Matters: The requirements doc says notifications should include:

“Expected profit give-back if delayed”
“Capital at risk if delayed”

We can’t provide this without real position data.

4. Evaluation Cadence - WRONG

Current: Hourly (at :01 past the hour) Required: Every 15 minutes

Impact:

Could miss rapid regime transitions
45-minute blind spot between evaluations
Doesn’t meet “MANDATORY_EXIT → Immediately” requirement

Fix: Change CronJob from 1 * * * * to 1,16,31,46 * * * *

5. Audit Logging - MISSING

Required (from requirements):

Log all exit signals
Log operator responses (or lack thereof)
Track response time vs recommended window
Enable retrospective analysis of signal quality

Current State: ❌ None of this exists

Code Quality Assessment

Strengths 💪

Well-Structured Modules
- Clean separation: regime / grid / metrics / exchanges
- Interface abstractions for testability
- Type hints throughout
Regime Engine is Solid
- Multiple complementary indicators
- 4h structural confirmation
- Confidence scoring
- Handles edge cases (low liquidity, data gaps)
Configuration Management
- Grid configs are declarative YAML
- Environment-based credential management
- External secrets integration
Git Storage Pattern
- Elegant solution for state persistence
- Built-in audit trail
- No database operational overhead

Weaknesses 🐛

No Tests
- Zero unit tests in metrics-service/tests/
- No integration tests
- Can’t refactor with confidence
Mixed Responsibilities
- history.py does BOTH metrics collection AND risk assessment
- Should be split into separate concerns
Hard-Coded Logic
- Magic numbers throughout (e.g., 0.40 confidence threshold)
- Should be configurable
Error Handling
- Limited retry logic for API failures
- No circuit breakers
- Git push failures could lose metrics
Notification Logic Scattered
- Some in send_regime_notifications.py
- Some in generate_entry_recommendations()
- Some leaking from history.py risk assessment
- Needs consolidation

Current Metrics Output Sample

Latest: 2026-01-31 07:00 UTC

regime_analysis:
  verdict: RANGE_WEAK
  confidence: 0.40
  primary_regime: RANGE
  secondary_signal: WEAKENING
  
grid_analysis:
  risk_metrics:
    position_health: BELOW_GRID  # Price: 268,011 vs Range: 307K-318.5K
    stop_loss_risk: SAFE         # Stop at 299K, well below current price
    capital_utilization: 0.0     # Grid disabled
    
risk_assessment:
  risk_level: LOW
  recommendations:
    - "Monitor mean reversion degradation"
    - "TRANSITION probability rising (40%)"
    
account_summary:
  total_usdt: 769.02
  available_usdt: 769.02
  locked_usdt: 0.0              # No active positions

Observation: Price is 38K USDT below the configured grid range. This grid was probably profitable before and got stopped out. Now the system is waiting for re-entry.

Gap Analysis

Critical Path to MVP

To fulfill the requirements in new-instructions.md, we need:

Component	Status	Effort	Priority
Fix notification recommendations	⚠️ Partial	1h	P0
Exit State Engine	❌ Missing	8-12h	P0
Mandatory Exit triggers	❌ Missing	4h	P0
Latest Acceptable Exit triggers	❌ Missing	4h	P0
Warning triggers	❌ Missing	2h	P1
Position Risk quantification	⚠️ Partial	6h	P1
15-min evaluation cadence	❌ Missing	0.5h	P1
Audit logging	❌ Missing	3h	P2
KPI tracking	❌ Missing	4h	P2
TOTAL		32-37h

Non-MVP Enhancements

Multi-symbol support (currently ETH/USDT only)
Backtesting framework
Performance dashboard
Automatic grid parameter tuning
Multi-exchange support (currently KuCoin only)

Architectural Recommendations

1. Implement Exit State Engine First

Why: This is the core value proposition. Everything else is supporting infrastructure.

Approach:

# src/exit_strategy/state_engine.py
 
class ExitStateEngine:
    def evaluate(self, regime: Dict, grid: Dict) -> ExitState:
        """
        Main entry point. Returns one of:
        - NORMAL
        - WARNING
        - LATEST_ACCEPTABLE_EXIT
        - MANDATORY_EXIT
        """
        if self._check_mandatory_exit(regime, grid):
            return ExitState.MANDATORY_EXIT
        elif self._check_latest_acceptable_exit(regime, grid):
            return ExitState.LATEST_ACCEPTABLE_EXIT
        elif self._check_warning(regime, grid):
            return ExitState.WARNING
        else:
            return ExitState.NORMAL

2. Refactor Notification System

Current: Monolithic script with mixed concerns Proposed:

send_regime_notifications.py  (orchestrator)
    ├→ src/exit_strategy/state_engine.py  (exit state classification)
    ├→ src/exit_strategy/message_builder.py  (notification content)
    └→ src/exit_strategy/pushover_client.py  (delivery)

3. Add Position Tracker

Purpose: Get real position data from exchange API, not just config

# src/grid/position_tracker.py
 
class PositionTracker:
    def get_active_positions(self, grid_id: str) -> List[Position]:
        """Fetch actual open orders from KuCoin API"""
        
    def calculate_pnl(self, positions: List[Position]) -> PnLSummary:
        """Calculate unrealized PnL"""
        
    def estimate_profit_giveback(self, exit_price: float) -> float:
        """If we exit now, how much profit do we give back?"""

4. Separate Risk Assessment from Metrics Collection

Current: metrics/history.py does both Proposed:

src/
  metrics/
    collector.py           # Fetch & store metrics
  risk/
    assessor.py           # Analyze metrics → risk level
  exit_strategy/
    state_engine.py       # Risk + regime → exit state

Testing Strategy

Phase 1: Unit Tests

Exit state trigger logic
Notification message building
Position PnL calculations

Phase 2: Integration Tests

Full regime analysis → exit state flow
Notification delivery (use test Pushover credentials)
Git storage round-trip

Phase 3: Backtesting

Replay historical metrics
Evaluate exit signal quality
Measure would-be profit preservation

Deployment Considerations

Current Deployment

Image: ghcr.io/craigedmunds/market-making/metrics-service:0.1.15
Schedule: CronJob at :01 every hour
Namespace: market-making
Secrets: KuCoin API keys via ExternalSecrets
Storage: Git-backed (market-maker-data repo)

Changes Needed for MVP

Update CronJob schedule to 15-min intervals
Add audit log storage (new Git directory or separate repo)
Consider separate CronJob for exit evaluation vs metrics collection
- Metrics: Hourly (heavy API usage)
- Exit evaluation: Every 15min (reads from Git)

Risk Assessment

Technical Risks

Risk	Impact	Likelihood	Mitigation
False MANDATORY_EXIT signals	High	Medium	Require 2+ confirming indicators
Missed regime transitions	High	Low	15-min cadence + multi-timeframe
API rate limiting	Medium	Low	Cache data, backoff strategy
Git push failures	Medium	Low	Retry logic, local backup

Operational Risks

Risk	Impact	Likelihood	Mitigation
Operator misses notification	High	Medium	Multi-channel delivery
Notification fatigue	Medium	High	Smart rate limiting by exit state
Grid stopped unnecessarily	Medium	Medium	Backtesting + tuning

Next Steps

Immediate (This Session)

✅ Fix notification recommendation logic (1h)
Implement basic Exit State Engine (4h)
Add MANDATORY_EXIT triggers (2h)
Update notification messages per exit state (1h)

Short-term (Next Session)

Add LATEST_ACCEPTABLE_EXIT triggers
Add WARNING triggers
Implement audit logging
Change to 15-min cadence
Add unit tests

Medium-term (Future)

Position Risk quantification with real API data
KPI tracking framework
Backtesting capability
Performance dashboard

Conclusion

The market-making system has a strong technical foundation but is only 40% complete toward the MVP requirements:

✅ Regime detection works well
✅ Grid configuration management solid
✅ Git storage pattern elegant
⚠️ Notifications work but have bugs
❌ Exit State Engine missing (core value)
❌ Position tracking incomplete
❌ Audit logging missing

Estimated time to MVP: 32-37 hours of focused development.

Recommendation: Implement Exit State Engine immediately. This is the differentiator that makes the system valuable.

Techcle Wiki

Explorer

SYSTEM_REVIEW