Phase 1: Data Quality - Detailed Implementation Plan

Duration: 2-3 weeks (40-60 hours)
Priority: P0 - CRITICAL BLOCKER
Goal: Replace all hardcoded dummy values with real metric calculations

Week 1: Metric Calculations (20-30 hours)

Day 1-2: ADX (Average Directional Index)

Objective: Implement ADX calculation with validation

Tasks:

Create module (2h)

mkdir -p repos/market-making/metrics-service/src/regime/metrics
touch repos/market-making/metrics-service/src/regime/metrics/__init__.py

Implement ADX (4h)

File: src/regime/metrics/adx.py

Function signature:

def calculate_adx(
    high: np.ndarray,
    low: np.ndarray,
    close: np.ndarray,
    period: int = 14
) -> Tuple[float, List[float]]:
    """
    Calculate ADX and ADX history
    
    Returns:
        (current_adx, adx_history)
    """

Implementation steps:
1. Calculate True Range (TR)
2. Calculate +DM and -DM (Directional Movement)
3. Smooth TR, +DM, -DM with Wilder’s smoothing
4. Calculate +DI and -DI
5. Calculate DX = |+DI - -DI| / (+DI + -DI) × 100
6. Smooth DX to get ADX

Unit Tests (3h)
- File: tests/regime/metrics/test_adx.py
- Test cases:
  1. Trending market (ADX > 40)
  2. Ranging market (ADX < 20)
  3. Edge case: insufficient data (< period bars)
  4. Edge case: flat price (zero movement)
  5. Validation: compare with TA-Lib or TradingView
Validation (2h)
- Load real ETH/USDT data from last month
- Calculate ADX
- Compare with TradingView ADX indicator
- Ensure values within 5% tolerance

Deliverable: ✅ ADX calculation working and validated

Day 3-4: Efficiency Ratio

Objective: Implement Perry Kaufman’s Efficiency Ratio

Tasks:

Implement ER (2h)

File: src/regime/metrics/efficiency_ratio.py

Function:

def calculate_efficiency_ratio(
    prices: np.ndarray,
    period: int = 10
) -> float:
    """
    Calculate Efficiency Ratio
    
    ER = |Price[0] - Price[n]| / Σ|Price[i] - Price[i-1]|
    
    Returns:
        Value in [0, 1], higher = more trending
    """

Unit Tests (2h)
- File: tests/regime/metrics/test_efficiency_ratio.py
- Test cases:
  1. Strong trend (ER > 0.8)
  2. Weak trend / ranging (ER < 0.3)
  3. Edge case: zero net change (ER = 0)
  4. Edge case: perfect trend (ER = 1.0)
Validation (1h)
- Calculate on known trending period (Dec 2024 ETH breakout)
- Calculate on known ranging period (Sep-Nov 2024)
- Verify ER high during trend, low during range

Deliverable: ✅ Efficiency Ratio working and tested

Day 5: Lag-1 Autocorrelation

Objective: Implement autocorrelation for mean reversion detection

Tasks:

Implement Autocorrelation (1.5h)

File: src/regime/metrics/autocorrelation.py

Function:

def calculate_lag1_autocorr(prices: np.ndarray) -> float:
    """
    Calculate lag-1 autocorrelation (Pearson)
    
    Returns:
        Value in [-1, 1], negative = mean reverting
    """

Unit Tests (1.5h)
- File: tests/regime/metrics/test_autocorrelation.py
- Test cases:
  1. Mean reverting series (negative correlation)
  2. Trending series (positive correlation)
  3. Random walk (near zero)
Validation (1h)
- Validate against pandas: series.autocorr(lag=1)
- Test on synthetic mean-reverting data

Deliverable: ✅ Autocorrelation working

Week 2: Advanced Metrics + Integration (20-25 hours)

Day 6-7: Ornstein-Uhlenbeck Half-Life

Objective: Most complex metric - OU process half-life estimation

Tasks:

Implement OU Half-Life (4h)

File: src/regime/metrics/ou_process.py

Functions:

def fit_ar1_model(prices: np.ndarray) -> float:
    """
    Fit AR(1): x[t] = ϕ × x[t-1] + ε
    
    Returns:
        AR coefficient ϕ
    """
 
def calculate_ou_halflife(prices: np.ndarray) -> Optional[float]:
    """
    Calculate OU half-life from AR(1) coefficient
    
    Formula: -log(2) / log(ϕ)
    
    Returns:
        Half-life in bars, or None if non-stationary
    """

Stationarity Check (2h)
- Add ADF (Augmented Dickey-Fuller) test
- Reject non-stationary series (|ϕ| ≥ 1)
- Return None if non-stationary
Unit Tests (3h)
- File: tests/regime/metrics/test_ou_process.py
- Test cases:
  1. Fast mean reversion (half-life < 10 bars)
  2. Slow mean reversion (half-life > 50 bars)
  3. Non-stationary rejection (trending series)
  4. Perfect mean reversion (synthetic data)
Validation (2h)
- Test on synthetic OU process with known half-life
- Validate against statsmodels AR implementation
- Test on real ranging ETH data

Deliverable: ✅ OU half-life calculation working

Day 8: Normalized Slope + BB Bandwidth

Objective: Final two metrics

Tasks:

Normalized Slope (1.5h)

File: src/regime/metrics/slope.py

Function:

def calculate_normalized_slope(
    prices: np.ndarray,
    atr: float,
    lookback: int = 10
) -> float:
    """
    Price slope normalized by ATR
    
    Formula: (Price[0] - Price[n]) / (ATR × n)
    """

Bollinger Band Bandwidth (1.5h)

File: src/regime/metrics/bollinger.py

Function:

def calculate_bb_bandwidth(
    prices: np.ndarray,
    period: int = 20,
    num_std: float = 2.0
) -> Tuple[float, List[float]]:
    """
    BB bandwidth = (upper - lower) / middle
    
    Returns:
        (current_bandwidth, bandwidth_history)
    """

Unit Tests (2h)
- tests/regime/metrics/test_slope.py (3 cases)
- tests/regime/metrics/test_bollinger.py (3 cases)
Validation (1h)
- Compare with TradingView Bollinger Bands
- Validate slope calculation manually

Deliverable: ✅ All 6 metrics implemented

Day 9-10: Integration with Regime Engine

Objective: Wire up metrics to regime engine, remove TODOs

Tasks:

Enhance Regime Classifier (4h)

File: src/regime/classifier.py
Add metric calculations to regime analysis

Store in detailed_analysis dict:

detailed_analysis = {
    'adx': {
        'current': adx_current,
        'history': adx_history
    },
    'efficiency_ratio': er_value,
    'autocorrelation': {
        'lag1': autocorr_value
    },
    'ou_process': {
        'half_life_hours': ou_halflife
    },
    'slope': {
        'normalized': norm_slope
    },
    'bollinger': {
        'bandwidth': bb_width,
        'bandwidth_history': bb_history
    }
}

Modify Regime Engine (3h)

File: src/regime/engine.py

Lines 268-280: Extract real values

# OLD (REMOVE):
adx = 25.0  # TODO
 
# NEW:
detailed = regime_state.detailed_analysis
adx = detailed.get('adx', {}).get('current')
if adx is None:
    raise ValueError("ADX not calculated in regime analysis")

Repeat for all 10 hardcoded values
Remove duplicate code at lines 349-359

Error Handling (2h)
- Fail fast if any metric missing
- Log which metrics were calculated
- Add debug output for validation
Testing (3h)
- Run regime engine on last 7 days of data
- Verify no errors
- Inspect output YAML files
- Confirm real values present

Deliverable: ✅ All TODOs removed, real metrics in YAMLs

Week 3: Validation + Quality Assurance (15-20 hours)

Day 11-12: Data Validation

Objective: Ensure metrics YAMLs are always valid

Tasks:

Schema Validator (4h)

File: src/regime/validation/schema_validator.py

Define schema:

METRICS_SCHEMA = {
    'adx': {
        'type': float,
        'range': [0, 100],
        'required': True
    },
    'efficiency_ratio': {
        'type': float,
        'range': [0, 1],
        'required': True
    },
    # ... all metrics ...
}

Implement validator:

def validate_metrics(metrics: Dict) -> Tuple[bool, List[str]]:
    """
    Validate metrics against schema
    
    Returns:
        (is_valid, list_of_errors)
    """

Integration (2h)
- Add validation before Git commit
- Fail metrics collection if validation fails
- Log validation errors
Unit Tests (2h)
- Test with valid metrics (should pass)
- Test with invalid values (should fail)
- Test with missing metrics (should fail)

Deliverable: ✅ Schema validation working

Day 13: Data Quality Dashboard

Objective: Visual confirmation of data quality

Tasks:

Dashboard Implementation (4h)
- File: src/regime/quality/dashboard.py
- Features:
  - Load last 30 days of metrics YAMLs
  - Check each metric for:
    - Dummy value detection (e.g., ADX always 25.0)
    - Valid ranges
    - Anomalies (sudden jumps)
  - Generate HTML report
Anomaly Detection (2h)
- Detect stuck values (ADX = 25.0 for > 10 consecutive hours)
- Detect out-of-range values
- Flag suspicious patterns
Visualization (2h)
- Plot metric trends over 30 days
- Highlight anomalies in red
- Show ✅ for real data, ⚠️ for suspicious

Deliverable: ✅ Quality dashboard showing data health

Day 14: Final Testing & Documentation

Objective: Ensure everything works, document for future

Tasks:

Integration Testing (3h)
- Run full metrics collection end-to-end
- Collect 1 hour of real data
- Verify YAML output
- Check Git commit
Code Review (2h)
- Self-review all code
- Check for TODOs (should be zero)
- Ensure consistent style
- Add docstrings
Documentation (2h)
- Update README
- Document each metric calculation
- Add usage examples
- Create troubleshooting guide
Deployment (2h)
- Merge to main branch
- Deploy to test environment
- Run for 24 hours
- Monitor for errors

Deliverable: ✅ Phase 1 complete, ready for Phase 2

Daily Checklist Template

Use this for tracking progress:

## Day X - [Date]
 
### Goals
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3
 
### Actual Completed
- [x] Task 1 - 2h (notes)
- [ ] Task 2 - Started, 50% done
 
### Blockers
- Issue with XYZ, need to research
 
### Learnings
- Discovered that TA-Lib uses different smoothing for ADX
 
### Tomorrow
- [ ] Complete Task 2
- [ ] Start Task 4

Testing Strategy

Unit Test Requirements

For each metric calculation:

✅ Normal case (typical market data)
✅ Trending case (strong directional move)
✅ Ranging case (sideways movement)
✅ Edge case: insufficient data
✅ Edge case: extreme values

Coverage Target: ≥ 90%

Validation Strategy

Compare against known implementations:

TA-Lib (if available)
TradingView indicators
pandas-ta
Manual calculations (Excel/Google Sheets)

Acceptable Tolerance: ±5% for smoothed indicators (ADX), ±1% for simple calculations (ER)

Integration Testing

End-to-End Flow:

Load real OHLCV data
Calculate regime
Calculate all metrics
Generate metrics YAML
Validate schema
Commit to Git
Verify file contents

Success Criteria Checklist

Phase 1 is COMPLETE when:

Then proceed to Phase 2!

Risk Mitigation

Risk: Metric calculations are incorrect

Mitigation:

Validate against 3+ sources (TA-Lib, TradingView, manual)
Use known test cases with verified outputs
Code review by experienced developer

Risk: OU half-life calculation too complex

Mitigation:

Use statsmodels for AR(1) fitting (proven library)
Add comprehensive logging for debugging
Accept partial failure (return None if can’t calculate)

Risk: Integration breaks existing functionality

Mitigation:

Feature branch for all changes
Integration tests before merge
Can rollback if issues found

Risk: Performance degradation

Mitigation:

Benchmark metric calculations (target: < 1s for all 6)
Use numpy for vectorized operations
Cache intermediate results

Next Steps After Phase 1

Once Phase 1 complete:

Celebrate 🎉 - Major milestone!
Review - What went well, what didn’t
Plan Phase 2 - Grid Exit Strategy implementation
Start Phase 2 Day 1 - Implement MANDATORY_EXIT triggers

Last Updated: 2026-01-31
Status: Ready to Execute

Techcle Wiki

Explorer

Phase 1 Plan

Phase 1: Data Quality - Detailed Implementation Plan

Week 1: Metric Calculations (20-30 hours)

Day 1-2: ADX (Average Directional Index)

Day 3-4: Efficiency Ratio

Day 5: Lag-1 Autocorrelation

Week 2: Advanced Metrics + Integration (20-25 hours)

Day 6-7: Ornstein-Uhlenbeck Half-Life

Day 8: Normalized Slope + BB Bandwidth

Day 9-10: Integration with Regime Engine

Week 3: Validation + Quality Assurance (15-20 hours)

Day 11-12: Data Validation

Day 13: Data Quality Dashboard

Day 14: Final Testing & Documentation

Daily Checklist Template

Testing Strategy

Unit Test Requirements

Validation Strategy

Integration Testing

Success Criteria Checklist

Risk Mitigation

Risk: Metric calculations are incorrect

Risk: OU half-life calculation too complex

Risk: Integration breaks existing functionality

Risk: Performance degradation

Next Steps After Phase 1

Graph View

Table of Contents