0004 - Regime Management: Snapshot Architecture
Summary
This specification completes the snapshot-based regime detection architecture for the market-making system. The core change separates data collection (creating comprehensive market snapshots) from regime analysis (consuming those snapshots), enabling better testability, historical analysis, and architectural clarity.
Problem Statement
The current regime detection system has tight coupling between data collection and regime analysis. This creates several issues:
- Testability: Cannot easily test regime analysis in isolation without mocking exchange APIs
- Historical Analysis: Cannot run regime detection on past data without re-fetching from exchanges
- Separation of Concerns: Metrics collection performs both data gathering and analysis in a single pass
- Validation Coverage: Multiple checkpoint tests remain incomplete (grid configuration, enabled status, restart gates, snapshot architecture)
The system needs to complete the transition to a snapshot-based architecture where:
- Data collection creates self-contained snapshot files with all data needed for regime analysis
- Regime analysis operates solely on pre-collected snapshot files
- Historical backtesting uses the same regime analysis code as live operation
Goals
- Complete snapshot file reader (Task 12.1): Enable regime analysis to read from pre-collected YAML snapshot files containing minute-level price data
- Refactor regime engine for snapshots (Task 12.2): Remove direct API dependencies from regime classification, using snapshot data instead
- Implement snapshot-based market data service (Task 12.3): Create a market data layer that reads from snapshots rather than live APIs
- Update recommendation generator (Task 12.4): Ensure recommendations can be generated from historical snapshot data
- Add snapshot validation (Task 12.5): Implement integrity checks ensuring snapshots contain complete data (60 minute prices per hour)
- Separate metrics collection from analysis (Task 13.1): Focus metrics collector purely on creating comprehensive snapshots
- Create standalone regime analysis service (Task 13.2): Extract regime analysis into a service that consumes snapshot files
- Update dashboard for snapshots (Task 13.3): Modify dashboard generation to use snapshot-based regime results
- Update backtesting system (Task 13.4): Enable backtesting to run regime analysis on historical snapshot files
- Validate checkpoints (Tasks 4.6, 7.2, 7.6, 10, 14): Complete all outstanding checkpoint test validations
Non-Goals
- Adding new regime classification algorithms (the classification logic is already implemented)
- Changing the YAML snapshot file format (the format is established)
- Adding new exchange integrations
- Modifying the grid trading execution logic
- Implementing real-time streaming (system uses hourly batch collection)
Technical Approach
Phase 1: Snapshot File Infrastructure (Tasks 12.1, 12.5)
Create snapshot file reader and validation:
# Snapshot file reader for regime analysis
class SnapshotReader:
def load_snapshot(self, market: str, timestamp: datetime) -> MarketSnapshot
def load_range(self, market: str, start: datetime, end: datetime) -> List[MarketSnapshot]
def validate_snapshot(self, snapshot: MarketSnapshot) -> ValidationResultValidation checks:
- 60 minute-level prices per hour
- Required fields present (market_summary, grid_config, regime_analysis)
- Data consistency across related snapshots
Phase 2: Regime Engine Refactor (Tasks 12.2, 12.3)
Modify regime engine to accept snapshot data instead of calling APIs:
# Current: Engine calls exchange APIs directly
result = regime_engine.analyze(symbol, exchange_client)
# Target: Engine receives pre-collected snapshot data
snapshot = snapshot_reader.load_snapshot(symbol, timestamp)
result = regime_engine.analyze_snapshot(snapshot)Create snapshot-based market data service:
- Replace live API calls with snapshot file access
- Add caching layer for frequently accessed data
- Support time-range queries across multiple snapshot files
Phase 3: Service Separation (Tasks 13.1, 13.2)
Split metrics collector responsibilities:
Metrics Collector (data collection only):
- Fetch market data from exchange APIs
- Collect grid status and configuration
- Create comprehensive snapshot YAML files
- No regime analysis logic
Regime Analysis Service (analysis only):
- Read snapshot files
- Perform regime classification
- Generate recommendations
- Support historical and current analysis
Phase 4: Dependent System Updates (Tasks 13.3, 13.4, 12.4)
Update systems that depend on regime analysis:
- Dashboard reads regime results from snapshot files
- Backtesting runs regime analysis on historical snapshots
- Recommendation generator works with snapshot-based regime detection
Phase 5: Checkpoint Validation (Tasks 4.6, 7.2, 7.6, 10, 14)
Complete all checkpoint test validations:
| Task | Checkpoint | Status | Remaining Work |
|---|---|---|---|
| 4.6 | Grid configuration management tests | Partial | Property tests 39-48 |
| 7.2 | Integration tests for enhanced metrics | Partial | n8n webhook, per-market file tests |
| 7.6 | Enabled status awareness tests | Partial | Property tests 49-53 |
| 10 | Restart gates tests | Partial | Property tests 54-74 |
| 14 | Snapshot-based architecture tests | Not started | Full test suite |
Success Criteria
- Snapshot Independence: Regime analysis produces identical results whether run on live data or snapshot files
- Complete Validation: All checkpoint tests pass (4.6, 7.2, 7.6, 10, 14)
- Historical Analysis: Can run regime detection on any historical period with existing snapshots
- Data Integrity: Snapshot validation catches incomplete or corrupted data before analysis
- Service Separation: Clear boundary between data collection and regime analysis components
- Test Coverage: Property tests implemented for snapshot-based regime detection (77-78)
Dependencies
- Existing Implementation: Tasks 1-11 are largely complete, providing the foundation
- Snapshot File Format: Established YAML format with minute-level price arrays
- Regime Classification Engine: Fully implemented (range discovery, feature calculation, score aggregation)
- Grid Restart Gates: Implemented but tests incomplete
Risks
| Risk | Impact | Mitigation |
|---|---|---|
| Snapshot/API result divergence | High | Property tests comparing snapshot vs live results |
| Performance with large snapshot files | Medium | Implement caching layer, lazy loading |
| Historical data gaps | Medium | Validation catches missing data, backfill tooling exists |
| Regression in live system | High | Run parallel comparison before cutover |
| Test suite execution time | Low | Property tests are marked optional with * |
Appendix: H-Priority Task Details
Checkpoint Tests (Partial)
Task 4.6 - Grid Configuration Management
- Property tests 39-48 not yet written
- Tests probationary grid parameters, validation criteria, quick stop triggers
- Tests configuration version consistency, metrics completeness
Task 7.2 - Enhanced Metrics Integration
- Integration test for collection-to-dashboard workflow incomplete
- n8n webhook integration testing needed
- Per-market file creation verification needed
Task 7.6 - Enabled Status Awareness
- Property tests 49-53 not yet written
- Tests enabled status consideration, disabled grid recommendations
- Tests grid repositioning recommendations
Task 10 - Restart Gates
- Property tests 54-74 not yet written
- Tests gate state initialization, condition evaluation, blocking behavior
- Tests gate progression, regime transitions, fresh grid parameters
Snapshot Architecture Tasks (Partial)
Task 12.1 - Snapshot File Reader
- Skeleton exists but needs completion
- Needs minute-level price data parsing
- Needs multi-market snapshot loading
Task 12.2 - Regime Engine Refactor
- Engine structure exists but still has API dependencies
- Needs snapshot data injection pattern
- Needs removal of direct exchange calls
Task 12.3 - Snapshot Market Data Service
- Not yet implemented
- Needs caching layer
- Needs time-range query support
Task 12.4 - Recommendation Generator Update
- Works with current regime engine
- Needs snapshot-based regime analysis integration
- Needs grid comparison from snapshot data
Task 12.5 - Snapshot Validation
- Basic structure exists
- Needs 60-minute completeness check
- Needs cross-file consistency validation
Service Separation Tasks (Not Started)
Task 13.1 - Metrics Collector Update
- Remove regime analysis from collection
- Focus on comprehensive snapshot creation
- Ensure all regime analysis inputs captured
Task 13.2 - Regime Analysis Service
- Extract into standalone service
- Add historical period analysis API
- Support batch analysis for backtesting
Task 13.3 - Dashboard Update
- Modify to read snapshot-based regime results
- Update chart generation for snapshot data
- Support historical regime visualization
Task 13.4 - Backtesting Update
- Use snapshot-based regime detection
- Run analysis on historical snapshot files
- Ensure consistency with live detection
Task 14 - Architecture Checkpoint
- Full test suite for snapshot architecture
- Verify data collection/analysis separation
- Confirm regime detection works with snapshots