Market Maker Hourly Output Review - February 2026
Review Date: 2026-02-12
Reviewer: AI Assistant
Scope: .builders/0013-market-maker-mvp hourly metrics and notifications
Status: Critical Issues Found
Executive Summary
The hourly market-maker-mvp produces static, unreliable notifications due to:
- Hardcoded fallback values used when data is unavailable
- Fake historical data created by repeating current values
- Missing historical data loading despite infrastructure existing
- Integer multiplication bugs causing 100-10000000x value inflation
- Static thresholds that don’t adapt to market conditions
Critical Issues Identified
1. Integer Multiplication Without Reverse Conversion
Severity: CRITICAL
Location:
repos/market-making/metrics-service/src/grid/configuration_manager.py:270-278repos/market-making/metrics-service/src/regime/engine.py:2709-2736
Issue: Values are multiplied by 100 or 10,000,000 and converted to integers for storage, but never divided back when read:
"price_range": {
"upper_bound": int(config.price_range.upper_bound * 100), # 3185.0 → 318500
"lower_bound": int(config.price_range.lower_bound * 100), # 3070.0 → 307000
},
"amount_per_grid": int(config.grid_structure.amount_per_grid * 10000000), # 0.0382 → 382378Impact:
- All price values inflated 100x
- Grid amounts inflated 10,000,000x
- Profit percentages show as 56-59 instead of 0.56-0.59
- Buy/sell recommendations based on inflated values are wrong
Fix Required:
- Remove integer conversion (YAML supports floats)
- OR add reverse conversion when reading values
- Add unit tests for round-trip conversion
- Regenerate all metrics with correct values
2. Fake Historical Data
Severity: HIGH
Location: repos/market-making/metrics-service/src/regime/engine.py:286-294
Issue: Historical arrays created by repeating current values instead of loading actual history:
adx_history = [adx] * 10 # TODO Phase 2: Load from previous YAML files
atr_history = [atr] * 100 # TODO Phase 2: Load from previous YAML files
bb_bandwidth_history = [bb_bandwidth] * 10 # TODO Phase 2: Load from previous YAML filesImpact:
- Trend detection algorithms see flat lines (no actual trends)
- Volatility expansion checks always return 1.0 (no expansion/contraction)
- Regime persistence checks are meaningless
- Time consistency analysis cannot detect regime flips
Fix Required:
- Verify historical YAML files exist (CONFIRMED: they exist)
- Verify MetricsHistoryLoader exists (CONFIRMED: exists in
exit_strategy/history_loader.py) - Create
RegimeHistoryLoaderwrapper class - Replace fake arrays with actual historical data loading
- Add fallback logic for when insufficient history available
- Add ADX to YAML output (currently not stored)
3. Hardcoded Fallback Values
Severity: HIGH
Location: repos/market-making/metrics-service/src/regime/engine.py:286-294
Issue: When data is unavailable, arbitrary constants are used:
adx = detailed.get("adx", {}).get("current", 25.0)
efficiency_ratio = detailed.get("efficiency_ratio", {}).get("current", 0.4)
ou_half_life = detailed.get("ou_process", {}).get("halflife_hours", 24.0)
atr = 1500.0 # Will be replaced when we store ATR in detailed_analysis
bb_bandwidth = detailed.get("bollinger", {}).get("bandwidth", 0.02)Impact:
- Same analysis produced regardless of actual market conditions
- ADX=25 always triggers “neutral trend” regime
- ATR=1500 becomes baseline for all volatility comparisons
- Notifications remain static when using fallback values
Fix Required:
- Ensure ADX calculation always succeeds (don’t use fallback)
- Load ATR from historical YAML (already stored as
volatility_metrics.atr_1h) - Calculate efficiency_ratio and OU halflife from actual data
- Remove hardcoded defaults or make them symbol-specific
- Add data quality checks before regime analysis
4. Static Classification Thresholds
Severity: MEDIUM
Location: repos/market-making/metrics-service/src/regime/classifier.py:49-69
Issue: Fixed percentile thresholds don’t adapt to symbol characteristics:
min_range_quality: float = 30.0
trend_enter_threshold: float = 70.0
trend_exit_threshold: float = 50.0
vol_change_threshold: float = 70.0Impact:
- Same thresholds for ETH at 4000
- Same thresholds for volatile altcoins and stable assets
- No accounting for symbol-specific volatility patterns
Fix Required:
- Calculate adaptive thresholds from recent history
- Use rolling percentiles instead of fixed values
- Make thresholds symbol-specific
- Backtest threshold values for accuracy
5. Missing Grid-Aware Recommendations
Severity: MEDIUM
Location: repos/market-making/metrics-service/src/notifications/send_regime_notifications.py:91-232
Issue: Notification recommendations don’t reference actual grid configuration:
if confidence < 0.5:
recs.append("Low regime confidence - wait for stronger signal")
# ... but doesn't check:
# - Current price vs grid bounds
# - Current position size
# - Grid spacing
# - Historical grid performanceImpact:
- Recommendations not actionable
- Doesn’t warn when price approaches grid boundaries
- Doesn’t consider if profitable levels are being hit
- No awareness of actual trading activity
Fix Required:
- Load grid configuration in notification generator
- Add price position within grid bounds
- Check if price approaching grid edges
- Reference filled orders and profitable levels
- Recommend grid adjustments when needed
6. Missing Baseline Storage
Severity: MEDIUM
Location: repos/market-making/metrics-service/src/exit_strategy/evaluator.py:63-104
Issue: Baseline metrics not stored, preventing comparison to entry conditions:
baseline_atr: Optional[float] = None # Should be calculated from history
baseline_halflife: Optional[float] = None # Should be calculated from historyGood News: baseline_atr is already stored in YAML metrics files at:
analysis.regime_analysis.volatility_metrics.baseline_atr
Fix Required:
- Confirm baseline_atr storage location (CONFIRMED)
- Load baseline_atr from historical YAML when available
- Add baseline_halflife to stored metrics
- Calculate baselines from entry conditions when grid starts
- Persist baselines in system_config_v1.yaml
7. Arbitrary Confidence Calculations
Severity: LOW
Location: repos/market-making/metrics-service/src/regime/classifier.py:203-232
Issue: Confidence formulas use hardcoded divisors and caps:
confidence = 0.7 + min(0.2, scores.mean_rev_score / 500.0)Impact:
- Confidence not calibrated to actual prediction accuracy
- No backtesting to validate confidence levels
- Users can’t trust confidence scores
Fix Required:
- Backtest regime predictions against actual outcomes
- Calculate confidence from historical accuracy
- Add confidence calibration based on market conditions
- Track prediction accuracy over time
Investigation Results
Historical Data Infrastructure (GOOD NEWS)
What Exists:
- ✅ Historical YAML files:
market-maker-data/metrics/YYYY/MM/DD/HH_SYMBOL.yaml - ✅
MetricsHistoryLoaderclass inexit_strategy/history_loader.py - ✅ ATR stored as
volatility_metrics.atr_1h - ✅ Baseline ATR stored as
volatility_metrics.baseline_atr - ✅ Regime verdicts stored in each hourly file
- ✅ Bollinger Band bandwidth stored
What’s Missing:
- ❌ ADX values not saved to YAML (calculated but discarded)
- ❌ Code to load historical arrays (infrastructure exists but not used)
- ❌ Baseline halflife not stored
Data Format Analysis
YAML Metrics Structure:
analysis:
regime_analysis:
verdict: RANGE_OK
confidence: 0.864
volatility_metrics:
atr_1h: 1525 # ✅ Available
baseline_atr: 1600 # ✅ Available
volatility_state: STABLE
volatility_expansion_ratio: 0.95
bb_bandwidth: 0.0215 # ✅ Available (if stored)
mean_reversion:
half_life_hours: 18.5 # ✅ Available for baseline_halflifeImplementation Task List
Phase 1: Fix Critical Bugs (Est: 4-6 hours)
Task 1.1: Fix Integer Multiplication Bug
Priority: CRITICAL
Est: 2 hours
- Remove integer conversion in
configuration_manager.py:270-278 - Remove integer conversion in
engine.py:2709-2736 - Add unit tests for YAML round-trip conversion
- Verify all numeric fields use proper float serialization
- Regenerate metrics to validate fix
Task 1.2: Create Historical Data Loader
Priority: HIGH
Est: 2 hours
- Create
src/regime/historical_loader.py - Implement
RegimeHistoryLoaderclass:-
load_atr_history(symbol, hours)- Load ATR from YAML -
load_adx_history(symbol, hours)- Load ADX from YAML (add storage first) -
load_bb_bandwidth_history(symbol, hours)- Load BB bandwidth -
load_regime_history(symbol, hours)- Load past regime verdicts
-
- Add caching with TTL
- Add fallback logic for insufficient history
- Write unit tests
Task 1.3: Replace Fake Historical Arrays
Priority: HIGH
Est: 2 hours
- Update
engine.py:286-299to useRegimeHistoryLoader - Replace
adx_history = [adx] * 10with actual loading - Replace
atr_history = [atr] * 100with actual loading - Replace
bb_bandwidth_history = [bb_bandwidth] * 10with actual loading - Add minimum history length validation
- Add logging when falling back to short history
Phase 2: Store Missing Metrics (Est: 2-3 hours)
Task 2.1: Add ADX to YAML Output
Priority: HIGH
Est: 1 hour
- Update
feature_calculation.pyto store ADX indetailed_analysis - Ensure ADX calculation never fails (handle edge cases)
- Add ADX to YAML serialization
- Verify ADX appears in generated YAML files
Task 2.2: Add Baseline Halflife Storage
Priority: MEDIUM
Est: 1 hour
- Add
baseline_halflifetovolatility_metricsin YAML - Calculate baseline from entry conditions
- Store in
system_config_v1.yamlfor persistence - Load baseline when evaluating exit conditions
Task 2.3: Store Comprehensive Metrics
Priority: LOW
Est: 1 hour
- Add trend_score to YAML output
- Add mean_reversion_score to YAML output
- Add vol_level_score to YAML output
- Enables future historical analysis of scoring changes
Phase 3: Improve Recommendations (Est: 3-4 hours)
Task 3.1: Add Grid-Aware Logic
Priority: MEDIUM
Est: 2 hours
- Load grid configuration in
send_regime_notifications.py - Calculate current price position within grid bounds
- Add warnings when price approaches grid edges
- Reference filled orders and profitable levels
- Recommend grid adjustments based on regime changes
Task 3.2: Implement Adaptive Thresholds
Priority: MEDIUM
Est: 2 hours
- Calculate rolling percentiles from recent history (30-90 days)
- Replace fixed thresholds with adaptive ones
- Make thresholds symbol-specific
- Store threshold calculations in metrics for transparency
Phase 4: Validate and Test (Est: 2-3 hours)
Task 4.1: Integration Testing
Priority: HIGH
Est: 1 hour
- Test historical data loading with real YAML files
- Verify ATR history loads correctly
- Test fallback behavior with insufficient history
- Verify no more fake arrays in production
Task 4.2: Data Quality Checks
Priority: MEDIUM
Est: 1 hour
- Add pre-flight checks before regime analysis
- Validate minimum history requirements met
- Log warnings when using fallback values
- Add health check endpoint for data quality
Task 4.3: Regenerate Historical Data
Priority: HIGH
Est: 1 hour
- Backfill ADX values for past 30 days
- Regenerate any metrics with integer conversion bugs
- Verify all YAML files have correct format
- Run validation across entire dataset
Phase 5: Monitoring and Calibration (Est: 2-3 hours)
Task 5.1: Confidence Calibration
Priority: LOW
Est: 2 hours
- Collect regime predictions and actual outcomes
- Calculate prediction accuracy by confidence level
- Adjust confidence formulas based on backtesting
- Add confidence calibration to metrics output
Task 5.2: Add Metrics Dashboard
Priority: LOW
Est: 1 hour
- Track prediction accuracy over time
- Monitor data quality metrics
- Alert when using fallback values
- Display historical loading success rate
Total Estimated Effort
- Phase 1 (Critical): 4-6 hours
- Phase 2 (High Priority): 2-3 hours
- Phase 3 (Medium Priority): 3-4 hours
- Phase 4 (Testing): 2-3 hours
- Phase 5 (Polish): 2-3 hours
Total: 13-19 hours
Recommended Sequence:
- Fix integer multiplication bug (blocks everything)
- Create historical loader and replace fake arrays
- Add missing metrics to YAML
- Integration testing
- Improve recommendations
- Monitoring and calibration
Success Criteria
Immediate (Post Phase 1-2)
- No integer multiplication bugs in stored values
- Historical arrays load from actual YAML files
- ADX values stored in YAML output
- ATR history shows actual volatility changes over time
- Regime verdicts reflect real market transitions
Medium-term (Post Phase 3-4)
- Notifications reference actual grid configuration
- Recommendations consider current price position
- Adaptive thresholds respond to market conditions
- Data quality checks prevent bad analysis
- Integration tests validate all historical loading
Long-term (Post Phase 5)
- Confidence scores calibrated to actual accuracy
- Prediction accuracy tracked over time
- Monitoring dashboard shows data health
- Users trust notification recommendations
Next Steps
- Review this document with stakeholders
- Prioritize phases based on business impact
- Start with Phase 1 (critical bug fixes)
- Implement in branches for safe testing
- Deploy incrementally with validation at each phase
References
- Investigation: Task session
ses_3ae0e2c43ffeaEGE5PWqjpAz1N - Code Review: Task session
ses_3ae0fe258ffeWWkhz1LLUayw9g - Integer Bug Review: Task session
ses_3ae31ebfcffeVs8Js8IEg9hV8j - Repository:
.builders/0013-market-maker-mvp/repos/market-making - Data Repository:
.builders/0013-market-maker-mvp/repos/market-maker-data