Market Maker Hourly Output Review - February 2026

Review Date: 2026-02-12
Reviewer: AI Assistant
Scope: .builders/0013-market-maker-mvp hourly metrics and notifications
Status: Critical Issues Found

Executive Summary

The hourly market-maker-mvp produces static, unreliable notifications due to:

Hardcoded fallback values used when data is unavailable
Fake historical data created by repeating current values
Missing historical data loading despite infrastructure existing
Integer multiplication bugs causing 100-10000000x value inflation
Static thresholds that don’t adapt to market conditions

Critical Issues Identified

1. Integer Multiplication Without Reverse Conversion

Severity: CRITICAL
Location:

repos/market-making/metrics-service/src/grid/configuration_manager.py:270-278
repos/market-making/metrics-service/src/regime/engine.py:2709-2736

Issue: Values are multiplied by 100 or 10,000,000 and converted to integers for storage, but never divided back when read:

"price_range": {
    "upper_bound": int(config.price_range.upper_bound * 100),  # 3185.0 → 318500
    "lower_bound": int(config.price_range.lower_bound * 100),  # 3070.0 → 307000
},
"amount_per_grid": int(config.grid_structure.amount_per_grid * 10000000),  # 0.0382 → 382378

Impact:

All price values inflated 100x
Grid amounts inflated 10,000,000x
Profit percentages show as 56-59 instead of 0.56-0.59
Buy/sell recommendations based on inflated values are wrong

Fix Required:

Remove integer conversion (YAML supports floats)
OR add reverse conversion when reading values
Add unit tests for round-trip conversion
Regenerate all metrics with correct values

2. Fake Historical Data

Severity: HIGH
Location: repos/market-making/metrics-service/src/regime/engine.py:286-294

Issue: Historical arrays created by repeating current values instead of loading actual history:

adx_history = [adx] * 10  # TODO Phase 2: Load from previous YAML files
atr_history = [atr] * 100  # TODO Phase 2: Load from previous YAML files
bb_bandwidth_history = [bb_bandwidth] * 10  # TODO Phase 2: Load from previous YAML files

Impact:

Trend detection algorithms see flat lines (no actual trends)
Volatility expansion checks always return 1.0 (no expansion/contraction)
Regime persistence checks are meaningless
Time consistency analysis cannot detect regime flips

Fix Required:

Verify historical YAML files exist (CONFIRMED: they exist)
Verify MetricsHistoryLoader exists (CONFIRMED: exists in exit_strategy/history_loader.py)
Create RegimeHistoryLoader wrapper class
Replace fake arrays with actual historical data loading
Add fallback logic for when insufficient history available
Add ADX to YAML output (currently not stored)

3. Hardcoded Fallback Values

Severity: HIGH
Location: repos/market-making/metrics-service/src/regime/engine.py:286-294

Issue: When data is unavailable, arbitrary constants are used:

adx = detailed.get("adx", {}).get("current", 25.0)
efficiency_ratio = detailed.get("efficiency_ratio", {}).get("current", 0.4)
ou_half_life = detailed.get("ou_process", {}).get("halflife_hours", 24.0)
atr = 1500.0  # Will be replaced when we store ATR in detailed_analysis
bb_bandwidth = detailed.get("bollinger", {}).get("bandwidth", 0.02)

Impact:

Same analysis produced regardless of actual market conditions
ADX=25 always triggers “neutral trend” regime
ATR=1500 becomes baseline for all volatility comparisons
Notifications remain static when using fallback values

Fix Required:

Ensure ADX calculation always succeeds (don’t use fallback)
Load ATR from historical YAML (already stored as volatility_metrics.atr_1h)
Calculate efficiency_ratio and OU halflife from actual data
Remove hardcoded defaults or make them symbol-specific
Add data quality checks before regime analysis

4. Static Classification Thresholds

Severity: MEDIUM
Location: repos/market-making/metrics-service/src/regime/classifier.py:49-69

Issue: Fixed percentile thresholds don’t adapt to symbol characteristics:

min_range_quality: float = 30.0
trend_enter_threshold: float = 70.0
trend_exit_threshold: float = 50.0
vol_change_threshold: float = 70.0

Impact:

Same thresholds for ETH at $2000 an d$ 4000
Same thresholds for volatile altcoins and stable assets
No accounting for symbol-specific volatility patterns

Fix Required:

Calculate adaptive thresholds from recent history
Use rolling percentiles instead of fixed values
Make thresholds symbol-specific
Backtest threshold values for accuracy

5. Missing Grid-Aware Recommendations

Severity: MEDIUM
Location: repos/market-making/metrics-service/src/notifications/send_regime_notifications.py:91-232

Issue: Notification recommendations don’t reference actual grid configuration:

if confidence < 0.5:
    recs.append("Low regime confidence - wait for stronger signal")
# ... but doesn't check:
# - Current price vs grid bounds
# - Current position size
# - Grid spacing
# - Historical grid performance

Impact:

Recommendations not actionable
Doesn’t warn when price approaches grid boundaries
Doesn’t consider if profitable levels are being hit
No awareness of actual trading activity

Fix Required:

Load grid configuration in notification generator
Add price position within grid bounds
Check if price approaching grid edges
Reference filled orders and profitable levels
Recommend grid adjustments when needed

6. Missing Baseline Storage

Severity: MEDIUM
Location: repos/market-making/metrics-service/src/exit_strategy/evaluator.py:63-104

Issue: Baseline metrics not stored, preventing comparison to entry conditions:

baseline_atr: Optional[float] = None  # Should be calculated from history
baseline_halflife: Optional[float] = None  # Should be calculated from history

Good News: baseline_atr is already stored in YAML metrics files at: analysis.regime_analysis.volatility_metrics.baseline_atr

Fix Required:

Confirm baseline_atr storage location (CONFIRMED)
Load baseline_atr from historical YAML when available
Add baseline_halflife to stored metrics
Calculate baselines from entry conditions when grid starts
Persist baselines in system_config_v1.yaml

7. Arbitrary Confidence Calculations

Severity: LOW
Location: repos/market-making/metrics-service/src/regime/classifier.py:203-232

Issue: Confidence formulas use hardcoded divisors and caps:

confidence = 0.7 + min(0.2, scores.mean_rev_score / 500.0)

Impact:

Confidence not calibrated to actual prediction accuracy
No backtesting to validate confidence levels
Users can’t trust confidence scores

Fix Required:

Backtest regime predictions against actual outcomes
Calculate confidence from historical accuracy
Add confidence calibration based on market conditions
Track prediction accuracy over time

Investigation Results

Historical Data Infrastructure (GOOD NEWS)

What Exists:

✅ Historical YAML files: market-maker-data/metrics/YYYY/MM/DD/HH_SYMBOL.yaml
✅ MetricsHistoryLoader class in exit_strategy/history_loader.py
✅ ATR stored as volatility_metrics.atr_1h
✅ Baseline ATR stored as volatility_metrics.baseline_atr
✅ Regime verdicts stored in each hourly file
✅ Bollinger Band bandwidth stored

What’s Missing:

❌ ADX values not saved to YAML (calculated but discarded)
❌ Code to load historical arrays (infrastructure exists but not used)
❌ Baseline halflife not stored

Data Format Analysis

YAML Metrics Structure:

analysis:
  regime_analysis:
    verdict: RANGE_OK
    confidence: 0.864
    volatility_metrics:
      atr_1h: 1525              # ✅ Available
      baseline_atr: 1600        # ✅ Available
      volatility_state: STABLE
      volatility_expansion_ratio: 0.95
      bb_bandwidth: 0.0215      # ✅ Available (if stored)
    mean_reversion:
      half_life_hours: 18.5     # ✅ Available for baseline_halflife

Implementation Task List

Phase 1: Fix Critical Bugs (Est: 4-6 hours)

Task 1.1: Fix Integer Multiplication Bug

Priority: CRITICAL
Est: 2 hours

Remove integer conversion in configuration_manager.py:270-278
Remove integer conversion in engine.py:2709-2736
Add unit tests for YAML round-trip conversion
Verify all numeric fields use proper float serialization
Regenerate metrics to validate fix

Task 1.2: Create Historical Data Loader

Priority: HIGH
Est: 2 hours

Create src/regime/historical_loader.py
Implement RegimeHistoryLoader class:
- load_atr_history(symbol, hours) - Load ATR from YAML
- load_adx_history(symbol, hours) - Load ADX from YAML (add storage first)
- load_bb_bandwidth_history(symbol, hours) - Load BB bandwidth
- load_regime_history(symbol, hours) - Load past regime verdicts
Add caching with TTL
Add fallback logic for insufficient history
Write unit tests

Task 1.3: Replace Fake Historical Arrays

Priority: HIGH
Est: 2 hours

Update engine.py:286-299 to use RegimeHistoryLoader
Replace adx_history = [adx] * 10 with actual loading
Replace atr_history = [atr] * 100 with actual loading
Replace bb_bandwidth_history = [bb_bandwidth] * 10 with actual loading
Add minimum history length validation
Add logging when falling back to short history

Phase 2: Store Missing Metrics (Est: 2-3 hours)

Task 2.1: Add ADX to YAML Output

Priority: HIGH
Est: 1 hour

Update feature_calculation.py to store ADX in detailed_analysis
Ensure ADX calculation never fails (handle edge cases)
Add ADX to YAML serialization
Verify ADX appears in generated YAML files

Task 2.2: Add Baseline Halflife Storage

Priority: MEDIUM
Est: 1 hour

Add baseline_halflife to volatility_metrics in YAML
Calculate baseline from entry conditions
Store in system_config_v1.yaml for persistence
Load baseline when evaluating exit conditions

Task 2.3: Store Comprehensive Metrics

Priority: LOW
Est: 1 hour

Add trend_score to YAML output
Add mean_reversion_score to YAML output
Add vol_level_score to YAML output
Enables future historical analysis of scoring changes

Phase 3: Improve Recommendations (Est: 3-4 hours)

Task 3.1: Add Grid-Aware Logic

Priority: MEDIUM
Est: 2 hours

Load grid configuration in send_regime_notifications.py
Calculate current price position within grid bounds
Add warnings when price approaches grid edges
Reference filled orders and profitable levels
Recommend grid adjustments based on regime changes

Task 3.2: Implement Adaptive Thresholds

Priority: MEDIUM
Est: 2 hours

Calculate rolling percentiles from recent history (30-90 days)
Replace fixed thresholds with adaptive ones
Make thresholds symbol-specific
Store threshold calculations in metrics for transparency

Phase 4: Validate and Test (Est: 2-3 hours)

Task 4.1: Integration Testing

Priority: HIGH
Est: 1 hour

Test historical data loading with real YAML files
Verify ATR history loads correctly
Test fallback behavior with insufficient history
Verify no more fake arrays in production

Task 4.2: Data Quality Checks

Priority: MEDIUM
Est: 1 hour

Add pre-flight checks before regime analysis
Validate minimum history requirements met
Log warnings when using fallback values
Add health check endpoint for data quality

Task 4.3: Regenerate Historical Data

Priority: HIGH
Est: 1 hour

Backfill ADX values for past 30 days
Regenerate any metrics with integer conversion bugs
Verify all YAML files have correct format
Run validation across entire dataset

Phase 5: Monitoring and Calibration (Est: 2-3 hours)

Task 5.1: Confidence Calibration

Priority: LOW
Est: 2 hours

Collect regime predictions and actual outcomes
Calculate prediction accuracy by confidence level
Adjust confidence formulas based on backtesting
Add confidence calibration to metrics output

Task 5.2: Add Metrics Dashboard

Priority: LOW
Est: 1 hour

Track prediction accuracy over time
Monitor data quality metrics
Alert when using fallback values
Display historical loading success rate

Total Estimated Effort

Phase 1 (Critical): 4-6 hours
Phase 2 (High Priority): 2-3 hours
Phase 3 (Medium Priority): 3-4 hours
Phase 4 (Testing): 2-3 hours
Phase 5 (Polish): 2-3 hours

Total: 13-19 hours

Recommended Sequence:

Fix integer multiplication bug (blocks everything)
Create historical loader and replace fake arrays
Add missing metrics to YAML
Integration testing
Improve recommendations
Monitoring and calibration

Success Criteria

Immediate (Post Phase 1-2)

No integer multiplication bugs in stored values
Historical arrays load from actual YAML files
ADX values stored in YAML output
ATR history shows actual volatility changes over time
Regime verdicts reflect real market transitions

Medium-term (Post Phase 3-4)

Notifications reference actual grid configuration
Recommendations consider current price position
Adaptive thresholds respond to market conditions
Data quality checks prevent bad analysis
Integration tests validate all historical loading

Long-term (Post Phase 5)

Confidence scores calibrated to actual accuracy
Prediction accuracy tracked over time
Monitoring dashboard shows data health
Users trust notification recommendations

Next Steps

Review this document with stakeholders
Prioritize phases based on business impact
Start with Phase 1 (critical bug fixes)
Implement in branches for safe testing
Deploy incrementally with validation at each phase

References

Investigation: Task session ses_3ae0e2c43ffeaEGE5PWqjpAz1N
Code Review: Task session ses_3ae0fe258ffeWWkhz1LLUayw9g
Integer Bug Review: Task session ses_3ae31ebfcffeVs8Js8IEg9hV8j
Repository: .builders/0013-market-maker-mvp/repos/market-making
Data Repository: .builders/0013-market-maker-mvp/repos/market-maker-data

Techcle Wiki

Explorer

2026 02 Hourly Output Review

Market Maker Hourly Output Review - February 2026

Executive Summary

Critical Issues Identified

1. Integer Multiplication Without Reverse Conversion

2. Fake Historical Data

3. Hardcoded Fallback Values

4. Static Classification Thresholds

5. Missing Grid-Aware Recommendations

6. Missing Baseline Storage

7. Arbitrary Confidence Calculations

Investigation Results

Historical Data Infrastructure (GOOD NEWS)

Data Format Analysis

Implementation Task List

Phase 1: Fix Critical Bugs (Est: 4-6 hours)

Task 1.1: Fix Integer Multiplication Bug

Task 1.2: Create Historical Data Loader

Task 1.3: Replace Fake Historical Arrays

Phase 2: Store Missing Metrics (Est: 2-3 hours)

Task 2.1: Add ADX to YAML Output

Task 2.2: Add Baseline Halflife Storage

Task 2.3: Store Comprehensive Metrics

Phase 3: Improve Recommendations (Est: 3-4 hours)

Task 3.1: Add Grid-Aware Logic

Task 3.2: Implement Adaptive Thresholds

Phase 4: Validate and Test (Est: 2-3 hours)

Task 4.1: Integration Testing

Task 4.2: Data Quality Checks

Task 4.3: Regenerate Historical Data

Phase 5: Monitoring and Calibration (Est: 2-3 hours)

Task 5.1: Confidence Calibration

Task 5.2: Add Metrics Dashboard

Total Estimated Effort

Success Criteria

Immediate (Post Phase 1-2)

Medium-term (Post Phase 3-4)

Long-term (Post Phase 5)

Next Steps

References

Graph View

Table of Contents