Market Maker Hourly Output Review - February 2026

Review Date: 2026-02-12
Reviewer: AI Assistant
Scope: .builders/0013-market-maker-mvp hourly metrics and notifications
Status: Critical Issues Found

Executive Summary

The hourly market-maker-mvp produces static, unreliable notifications due to:

  1. Hardcoded fallback values used when data is unavailable
  2. Fake historical data created by repeating current values
  3. Missing historical data loading despite infrastructure existing
  4. Integer multiplication bugs causing 100-10000000x value inflation
  5. Static thresholds that don’t adapt to market conditions

Critical Issues Identified

1. Integer Multiplication Without Reverse Conversion

Severity: CRITICAL
Location:

  • repos/market-making/metrics-service/src/grid/configuration_manager.py:270-278
  • repos/market-making/metrics-service/src/regime/engine.py:2709-2736

Issue: Values are multiplied by 100 or 10,000,000 and converted to integers for storage, but never divided back when read:

"price_range": {
    "upper_bound": int(config.price_range.upper_bound * 100),  # 3185.0 → 318500
    "lower_bound": int(config.price_range.lower_bound * 100),  # 3070.0 → 307000
},
"amount_per_grid": int(config.grid_structure.amount_per_grid * 10000000),  # 0.0382 → 382378

Impact:

  • All price values inflated 100x
  • Grid amounts inflated 10,000,000x
  • Profit percentages show as 56-59 instead of 0.56-0.59
  • Buy/sell recommendations based on inflated values are wrong

Fix Required:

  • Remove integer conversion (YAML supports floats)
  • OR add reverse conversion when reading values
  • Add unit tests for round-trip conversion
  • Regenerate all metrics with correct values

2. Fake Historical Data

Severity: HIGH
Location: repos/market-making/metrics-service/src/regime/engine.py:286-294

Issue: Historical arrays created by repeating current values instead of loading actual history:

adx_history = [adx] * 10  # TODO Phase 2: Load from previous YAML files
atr_history = [atr] * 100  # TODO Phase 2: Load from previous YAML files
bb_bandwidth_history = [bb_bandwidth] * 10  # TODO Phase 2: Load from previous YAML files

Impact:

  • Trend detection algorithms see flat lines (no actual trends)
  • Volatility expansion checks always return 1.0 (no expansion/contraction)
  • Regime persistence checks are meaningless
  • Time consistency analysis cannot detect regime flips

Fix Required:

  • Verify historical YAML files exist (CONFIRMED: they exist)
  • Verify MetricsHistoryLoader exists (CONFIRMED: exists in exit_strategy/history_loader.py)
  • Create RegimeHistoryLoader wrapper class
  • Replace fake arrays with actual historical data loading
  • Add fallback logic for when insufficient history available
  • Add ADX to YAML output (currently not stored)

3. Hardcoded Fallback Values

Severity: HIGH
Location: repos/market-making/metrics-service/src/regime/engine.py:286-294

Issue: When data is unavailable, arbitrary constants are used:

adx = detailed.get("adx", {}).get("current", 25.0)
efficiency_ratio = detailed.get("efficiency_ratio", {}).get("current", 0.4)
ou_half_life = detailed.get("ou_process", {}).get("halflife_hours", 24.0)
atr = 1500.0  # Will be replaced when we store ATR in detailed_analysis
bb_bandwidth = detailed.get("bollinger", {}).get("bandwidth", 0.02)

Impact:

  • Same analysis produced regardless of actual market conditions
  • ADX=25 always triggers “neutral trend” regime
  • ATR=1500 becomes baseline for all volatility comparisons
  • Notifications remain static when using fallback values

Fix Required:

  • Ensure ADX calculation always succeeds (don’t use fallback)
  • Load ATR from historical YAML (already stored as volatility_metrics.atr_1h)
  • Calculate efficiency_ratio and OU halflife from actual data
  • Remove hardcoded defaults or make them symbol-specific
  • Add data quality checks before regime analysis

4. Static Classification Thresholds

Severity: MEDIUM
Location: repos/market-making/metrics-service/src/regime/classifier.py:49-69

Issue: Fixed percentile thresholds don’t adapt to symbol characteristics:

min_range_quality: float = 30.0
trend_enter_threshold: float = 70.0
trend_exit_threshold: float = 50.0
vol_change_threshold: float = 70.0

Impact:

  • Same thresholds for ETH at 4000
  • Same thresholds for volatile altcoins and stable assets
  • No accounting for symbol-specific volatility patterns

Fix Required:

  • Calculate adaptive thresholds from recent history
  • Use rolling percentiles instead of fixed values
  • Make thresholds symbol-specific
  • Backtest threshold values for accuracy

5. Missing Grid-Aware Recommendations

Severity: MEDIUM
Location: repos/market-making/metrics-service/src/notifications/send_regime_notifications.py:91-232

Issue: Notification recommendations don’t reference actual grid configuration:

if confidence < 0.5:
    recs.append("Low regime confidence - wait for stronger signal")
# ... but doesn't check:
# - Current price vs grid bounds
# - Current position size
# - Grid spacing
# - Historical grid performance

Impact:

  • Recommendations not actionable
  • Doesn’t warn when price approaches grid boundaries
  • Doesn’t consider if profitable levels are being hit
  • No awareness of actual trading activity

Fix Required:

  • Load grid configuration in notification generator
  • Add price position within grid bounds
  • Check if price approaching grid edges
  • Reference filled orders and profitable levels
  • Recommend grid adjustments when needed

6. Missing Baseline Storage

Severity: MEDIUM
Location: repos/market-making/metrics-service/src/exit_strategy/evaluator.py:63-104

Issue: Baseline metrics not stored, preventing comparison to entry conditions:

baseline_atr: Optional[float] = None  # Should be calculated from history
baseline_halflife: Optional[float] = None  # Should be calculated from history

Good News: baseline_atr is already stored in YAML metrics files at: analysis.regime_analysis.volatility_metrics.baseline_atr

Fix Required:

  • Confirm baseline_atr storage location (CONFIRMED)
  • Load baseline_atr from historical YAML when available
  • Add baseline_halflife to stored metrics
  • Calculate baselines from entry conditions when grid starts
  • Persist baselines in system_config_v1.yaml

7. Arbitrary Confidence Calculations

Severity: LOW
Location: repos/market-making/metrics-service/src/regime/classifier.py:203-232

Issue: Confidence formulas use hardcoded divisors and caps:

confidence = 0.7 + min(0.2, scores.mean_rev_score / 500.0)

Impact:

  • Confidence not calibrated to actual prediction accuracy
  • No backtesting to validate confidence levels
  • Users can’t trust confidence scores

Fix Required:

  • Backtest regime predictions against actual outcomes
  • Calculate confidence from historical accuracy
  • Add confidence calibration based on market conditions
  • Track prediction accuracy over time

Investigation Results

Historical Data Infrastructure (GOOD NEWS)

What Exists:

  • ✅ Historical YAML files: market-maker-data/metrics/YYYY/MM/DD/HH_SYMBOL.yaml
  • MetricsHistoryLoader class in exit_strategy/history_loader.py
  • ✅ ATR stored as volatility_metrics.atr_1h
  • ✅ Baseline ATR stored as volatility_metrics.baseline_atr
  • ✅ Regime verdicts stored in each hourly file
  • ✅ Bollinger Band bandwidth stored

What’s Missing:

  • ❌ ADX values not saved to YAML (calculated but discarded)
  • ❌ Code to load historical arrays (infrastructure exists but not used)
  • ❌ Baseline halflife not stored

Data Format Analysis

YAML Metrics Structure:

analysis:
  regime_analysis:
    verdict: RANGE_OK
    confidence: 0.864
    volatility_metrics:
      atr_1h: 1525              # ✅ Available
      baseline_atr: 1600        # ✅ Available
      volatility_state: STABLE
      volatility_expansion_ratio: 0.95
      bb_bandwidth: 0.0215      # ✅ Available (if stored)
    mean_reversion:
      half_life_hours: 18.5     # ✅ Available for baseline_halflife

Implementation Task List

Phase 1: Fix Critical Bugs (Est: 4-6 hours)

Task 1.1: Fix Integer Multiplication Bug

Priority: CRITICAL
Est: 2 hours

  • Remove integer conversion in configuration_manager.py:270-278
  • Remove integer conversion in engine.py:2709-2736
  • Add unit tests for YAML round-trip conversion
  • Verify all numeric fields use proper float serialization
  • Regenerate metrics to validate fix

Task 1.2: Create Historical Data Loader

Priority: HIGH
Est: 2 hours

  • Create src/regime/historical_loader.py
  • Implement RegimeHistoryLoader class:
    • load_atr_history(symbol, hours) - Load ATR from YAML
    • load_adx_history(symbol, hours) - Load ADX from YAML (add storage first)
    • load_bb_bandwidth_history(symbol, hours) - Load BB bandwidth
    • load_regime_history(symbol, hours) - Load past regime verdicts
  • Add caching with TTL
  • Add fallback logic for insufficient history
  • Write unit tests

Task 1.3: Replace Fake Historical Arrays

Priority: HIGH
Est: 2 hours

  • Update engine.py:286-299 to use RegimeHistoryLoader
  • Replace adx_history = [adx] * 10 with actual loading
  • Replace atr_history = [atr] * 100 with actual loading
  • Replace bb_bandwidth_history = [bb_bandwidth] * 10 with actual loading
  • Add minimum history length validation
  • Add logging when falling back to short history

Phase 2: Store Missing Metrics (Est: 2-3 hours)

Task 2.1: Add ADX to YAML Output

Priority: HIGH
Est: 1 hour

  • Update feature_calculation.py to store ADX in detailed_analysis
  • Ensure ADX calculation never fails (handle edge cases)
  • Add ADX to YAML serialization
  • Verify ADX appears in generated YAML files

Task 2.2: Add Baseline Halflife Storage

Priority: MEDIUM
Est: 1 hour

  • Add baseline_halflife to volatility_metrics in YAML
  • Calculate baseline from entry conditions
  • Store in system_config_v1.yaml for persistence
  • Load baseline when evaluating exit conditions

Task 2.3: Store Comprehensive Metrics

Priority: LOW
Est: 1 hour

  • Add trend_score to YAML output
  • Add mean_reversion_score to YAML output
  • Add vol_level_score to YAML output
  • Enables future historical analysis of scoring changes

Phase 3: Improve Recommendations (Est: 3-4 hours)

Task 3.1: Add Grid-Aware Logic

Priority: MEDIUM
Est: 2 hours

  • Load grid configuration in send_regime_notifications.py
  • Calculate current price position within grid bounds
  • Add warnings when price approaches grid edges
  • Reference filled orders and profitable levels
  • Recommend grid adjustments based on regime changes

Task 3.2: Implement Adaptive Thresholds

Priority: MEDIUM
Est: 2 hours

  • Calculate rolling percentiles from recent history (30-90 days)
  • Replace fixed thresholds with adaptive ones
  • Make thresholds symbol-specific
  • Store threshold calculations in metrics for transparency

Phase 4: Validate and Test (Est: 2-3 hours)

Task 4.1: Integration Testing

Priority: HIGH
Est: 1 hour

  • Test historical data loading with real YAML files
  • Verify ATR history loads correctly
  • Test fallback behavior with insufficient history
  • Verify no more fake arrays in production

Task 4.2: Data Quality Checks

Priority: MEDIUM
Est: 1 hour

  • Add pre-flight checks before regime analysis
  • Validate minimum history requirements met
  • Log warnings when using fallback values
  • Add health check endpoint for data quality

Task 4.3: Regenerate Historical Data

Priority: HIGH
Est: 1 hour

  • Backfill ADX values for past 30 days
  • Regenerate any metrics with integer conversion bugs
  • Verify all YAML files have correct format
  • Run validation across entire dataset

Phase 5: Monitoring and Calibration (Est: 2-3 hours)

Task 5.1: Confidence Calibration

Priority: LOW
Est: 2 hours

  • Collect regime predictions and actual outcomes
  • Calculate prediction accuracy by confidence level
  • Adjust confidence formulas based on backtesting
  • Add confidence calibration to metrics output

Task 5.2: Add Metrics Dashboard

Priority: LOW
Est: 1 hour

  • Track prediction accuracy over time
  • Monitor data quality metrics
  • Alert when using fallback values
  • Display historical loading success rate

Total Estimated Effort

  • Phase 1 (Critical): 4-6 hours
  • Phase 2 (High Priority): 2-3 hours
  • Phase 3 (Medium Priority): 3-4 hours
  • Phase 4 (Testing): 2-3 hours
  • Phase 5 (Polish): 2-3 hours

Total: 13-19 hours

Recommended Sequence:

  1. Fix integer multiplication bug (blocks everything)
  2. Create historical loader and replace fake arrays
  3. Add missing metrics to YAML
  4. Integration testing
  5. Improve recommendations
  6. Monitoring and calibration

Success Criteria

Immediate (Post Phase 1-2)

  • No integer multiplication bugs in stored values
  • Historical arrays load from actual YAML files
  • ADX values stored in YAML output
  • ATR history shows actual volatility changes over time
  • Regime verdicts reflect real market transitions

Medium-term (Post Phase 3-4)

  • Notifications reference actual grid configuration
  • Recommendations consider current price position
  • Adaptive thresholds respond to market conditions
  • Data quality checks prevent bad analysis
  • Integration tests validate all historical loading

Long-term (Post Phase 5)

  • Confidence scores calibrated to actual accuracy
  • Prediction accuracy tracked over time
  • Monitoring dashboard shows data health
  • Users trust notification recommendations

Next Steps

  1. Review this document with stakeholders
  2. Prioritize phases based on business impact
  3. Start with Phase 1 (critical bug fixes)
  4. Implement in branches for safe testing
  5. Deploy incrementally with validation at each phase

References

  • Investigation: Task session ses_3ae0e2c43ffeaEGE5PWqjpAz1N
  • Code Review: Task session ses_3ae0fe258ffeWWkhz1LLUayw9g
  • Integer Bug Review: Task session ses_3ae31ebfcffeVs8Js8IEg9hV8j
  • Repository: .builders/0013-market-maker-mvp/repos/market-making
  • Data Repository: .builders/0013-market-maker-mvp/repos/market-maker-data