0004 - Regime Management: Snapshot Architecture

Summary

This specification completes the snapshot-based regime detection architecture for the market-making system. The core change separates data collection (creating comprehensive market snapshots) from regime analysis (consuming those snapshots), enabling better testability, historical analysis, and architectural clarity.

Problem Statement

The current regime detection system has tight coupling between data collection and regime analysis. This creates several issues:

  1. Testability: Cannot easily test regime analysis in isolation without mocking exchange APIs
  2. Historical Analysis: Cannot run regime detection on past data without re-fetching from exchanges
  3. Separation of Concerns: Metrics collection performs both data gathering and analysis in a single pass
  4. Validation Coverage: Multiple checkpoint tests remain incomplete (grid configuration, enabled status, restart gates, snapshot architecture)

The system needs to complete the transition to a snapshot-based architecture where:

  • Data collection creates self-contained snapshot files with all data needed for regime analysis
  • Regime analysis operates solely on pre-collected snapshot files
  • Historical backtesting uses the same regime analysis code as live operation

Goals

  1. Complete snapshot file reader (Task 12.1): Enable regime analysis to read from pre-collected YAML snapshot files containing minute-level price data
  2. Refactor regime engine for snapshots (Task 12.2): Remove direct API dependencies from regime classification, using snapshot data instead
  3. Implement snapshot-based market data service (Task 12.3): Create a market data layer that reads from snapshots rather than live APIs
  4. Update recommendation generator (Task 12.4): Ensure recommendations can be generated from historical snapshot data
  5. Add snapshot validation (Task 12.5): Implement integrity checks ensuring snapshots contain complete data (60 minute prices per hour)
  6. Separate metrics collection from analysis (Task 13.1): Focus metrics collector purely on creating comprehensive snapshots
  7. Create standalone regime analysis service (Task 13.2): Extract regime analysis into a service that consumes snapshot files
  8. Update dashboard for snapshots (Task 13.3): Modify dashboard generation to use snapshot-based regime results
  9. Update backtesting system (Task 13.4): Enable backtesting to run regime analysis on historical snapshot files
  10. Validate checkpoints (Tasks 4.6, 7.2, 7.6, 10, 14): Complete all outstanding checkpoint test validations

Non-Goals

  • Adding new regime classification algorithms (the classification logic is already implemented)
  • Changing the YAML snapshot file format (the format is established)
  • Adding new exchange integrations
  • Modifying the grid trading execution logic
  • Implementing real-time streaming (system uses hourly batch collection)

Technical Approach

Phase 1: Snapshot File Infrastructure (Tasks 12.1, 12.5)

Create snapshot file reader and validation:

# Snapshot file reader for regime analysis
class SnapshotReader:
    def load_snapshot(self, market: str, timestamp: datetime) -> MarketSnapshot
    def load_range(self, market: str, start: datetime, end: datetime) -> List[MarketSnapshot]
    def validate_snapshot(self, snapshot: MarketSnapshot) -> ValidationResult

Validation checks:

  • 60 minute-level prices per hour
  • Required fields present (market_summary, grid_config, regime_analysis)
  • Data consistency across related snapshots

Phase 2: Regime Engine Refactor (Tasks 12.2, 12.3)

Modify regime engine to accept snapshot data instead of calling APIs:

# Current: Engine calls exchange APIs directly
result = regime_engine.analyze(symbol, exchange_client)
 
# Target: Engine receives pre-collected snapshot data
snapshot = snapshot_reader.load_snapshot(symbol, timestamp)
result = regime_engine.analyze_snapshot(snapshot)

Create snapshot-based market data service:

  • Replace live API calls with snapshot file access
  • Add caching layer for frequently accessed data
  • Support time-range queries across multiple snapshot files

Phase 3: Service Separation (Tasks 13.1, 13.2)

Split metrics collector responsibilities:

Metrics Collector (data collection only):

  • Fetch market data from exchange APIs
  • Collect grid status and configuration
  • Create comprehensive snapshot YAML files
  • No regime analysis logic

Regime Analysis Service (analysis only):

  • Read snapshot files
  • Perform regime classification
  • Generate recommendations
  • Support historical and current analysis

Phase 4: Dependent System Updates (Tasks 13.3, 13.4, 12.4)

Update systems that depend on regime analysis:

  • Dashboard reads regime results from snapshot files
  • Backtesting runs regime analysis on historical snapshots
  • Recommendation generator works with snapshot-based regime detection

Phase 5: Checkpoint Validation (Tasks 4.6, 7.2, 7.6, 10, 14)

Complete all checkpoint test validations:

TaskCheckpointStatusRemaining Work
4.6Grid configuration management testsPartialProperty tests 39-48
7.2Integration tests for enhanced metricsPartialn8n webhook, per-market file tests
7.6Enabled status awareness testsPartialProperty tests 49-53
10Restart gates testsPartialProperty tests 54-74
14Snapshot-based architecture testsNot startedFull test suite

Success Criteria

  1. Snapshot Independence: Regime analysis produces identical results whether run on live data or snapshot files
  2. Complete Validation: All checkpoint tests pass (4.6, 7.2, 7.6, 10, 14)
  3. Historical Analysis: Can run regime detection on any historical period with existing snapshots
  4. Data Integrity: Snapshot validation catches incomplete or corrupted data before analysis
  5. Service Separation: Clear boundary between data collection and regime analysis components
  6. Test Coverage: Property tests implemented for snapshot-based regime detection (77-78)

Dependencies

  • Existing Implementation: Tasks 1-11 are largely complete, providing the foundation
  • Snapshot File Format: Established YAML format with minute-level price arrays
  • Regime Classification Engine: Fully implemented (range discovery, feature calculation, score aggregation)
  • Grid Restart Gates: Implemented but tests incomplete

Risks

RiskImpactMitigation
Snapshot/API result divergenceHighProperty tests comparing snapshot vs live results
Performance with large snapshot filesMediumImplement caching layer, lazy loading
Historical data gapsMediumValidation catches missing data, backfill tooling exists
Regression in live systemHighRun parallel comparison before cutover
Test suite execution timeLowProperty tests are marked optional with *

Appendix: H-Priority Task Details

Checkpoint Tests (Partial)

Task 4.6 - Grid Configuration Management

  • Property tests 39-48 not yet written
  • Tests probationary grid parameters, validation criteria, quick stop triggers
  • Tests configuration version consistency, metrics completeness

Task 7.2 - Enhanced Metrics Integration

  • Integration test for collection-to-dashboard workflow incomplete
  • n8n webhook integration testing needed
  • Per-market file creation verification needed

Task 7.6 - Enabled Status Awareness

  • Property tests 49-53 not yet written
  • Tests enabled status consideration, disabled grid recommendations
  • Tests grid repositioning recommendations

Task 10 - Restart Gates

  • Property tests 54-74 not yet written
  • Tests gate state initialization, condition evaluation, blocking behavior
  • Tests gate progression, regime transitions, fresh grid parameters

Snapshot Architecture Tasks (Partial)

Task 12.1 - Snapshot File Reader

  • Skeleton exists but needs completion
  • Needs minute-level price data parsing
  • Needs multi-market snapshot loading

Task 12.2 - Regime Engine Refactor

  • Engine structure exists but still has API dependencies
  • Needs snapshot data injection pattern
  • Needs removal of direct exchange calls

Task 12.3 - Snapshot Market Data Service

  • Not yet implemented
  • Needs caching layer
  • Needs time-range query support

Task 12.4 - Recommendation Generator Update

  • Works with current regime engine
  • Needs snapshot-based regime analysis integration
  • Needs grid comparison from snapshot data

Task 12.5 - Snapshot Validation

  • Basic structure exists
  • Needs 60-minute completeness check
  • Needs cross-file consistency validation

Service Separation Tasks (Not Started)

Task 13.1 - Metrics Collector Update

  • Remove regime analysis from collection
  • Focus on comprehensive snapshot creation
  • Ensure all regime analysis inputs captured

Task 13.2 - Regime Analysis Service

  • Extract into standalone service
  • Add historical period analysis API
  • Support batch analysis for backtesting

Task 13.3 - Dashboard Update

  • Modify to read snapshot-based regime results
  • Update chart generation for snapshot data
  • Support historical regime visualization

Task 13.4 - Backtesting Update

  • Use snapshot-based regime detection
  • Run analysis on historical snapshot files
  • Ensure consistency with live detection

Task 14 - Architecture Checkpoint

  • Full test suite for snapshot architecture
  • Verify data collection/analysis separation
  • Confirm regime detection works with snapshots