Product Requirements Document - Grid Exit Strategy - Phases 2-5
Author: Craig Date: 2026-02-01
Success Criteria
User Success (You as Grid Trader)
Decision Confidence:
- You can articulate WHY you entered and exited every position using audit trail evidence
- You have at least 30 minutes warning time between exit state transitions in 90%+ of cases
- You can answer “why didn’t you exit here?” for any historical moment using immutable decision records
Capital Protection:
- Zero stop-loss breaches during normal market conditions (excluding “world-defining moments”)
- System provides WARNING state at least 1-2 hours before LATEST_ACCEPTABLE_EXIT
- At least 2 hours between LATEST_ACCEPTABLE_EXIT and MANDATORY_EXIT states
- No catastrophic exits (defined as: hitting exchange stop-loss instead of graceful exit)
Operational Clarity:
- Exit state transitions are clear and actionable (you know what WARNING/LATEST_ACCEPTABLE/MANDATORY mean in real-time)
- System evaluates regime hourly with consistent decision logic
- Restart gates prevent premature re-entry after trend stops
Business Success (Capital Scaling & Investor Readiness)
Capital Scaling Milestone:
- Double capital stake from £1K to £2K within Phase 2-5 validation period (2-4 weeks live operation)
- System proven ready to support £10K capital allocation (risk calculations, position sizing, audit trails all scale)
Investor Credibility:
- Complete immutable audit trail in Git showing every decision with timestamps
- Backtesting results demonstrate exit strategy would have prevented historical drawdowns
- Ability to generate “decision quality” reports showing regime classification accuracy vs outcomes
- Clean separation of “recommendation quality” (was regime correct?) vs “action quality” (did I follow the recommendation?)
Exit Quality Metrics:
- KPI framework operational and tracking exit quality (how early did we exit vs when we should have?)
- Historical analysis showing system identified trend breakouts before significant capital loss
- Documented evidence of false-positive rate (stopped grids that stayed range-bound)
Technical Success
Phase 2 Complete:
- Three-gate restart logic implemented and tested (Directional Energy Decay → Mean Reversion Return → Tradable Volatility)
- Exit state transitions functional (WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
- State transition tracking in decision records
- Historical data loading supports gate evaluation
Phase 3 Complete:
- KuCoin position tracker integrated and returning accurate position data
- Capital risk calculator quantifying exposure in real-time
- Enhanced notifications include risk metrics (current exposure, distance to stop-loss, time in exit state)
Phase 4 Complete:
- 100% test coverage for new exit logic (matching Phase 1 quality: 60+ tests, all passing)
- Backtesting framework operational and validated against 3-6 months of historical data
- CI/CD integration preventing regression
- Documented test scenarios covering edge cases (volatility spikes, gap moves, data failures)
Phase 5 Complete:
- Hourly evaluation cadence operational with monitoring
- Audit logging captures all state transitions with context (regime metrics, confidence scores, gate status)
- KPI tracking framework operational
- Documentation complete for investor presentation
Measurable Outcomes
Completion Criteria (Phases 2-5 “Done”):
- ✅ All code implemented with 100% test pass rate
- ✅ Backtested against historical trend breakouts (3-6 months data)
- ✅ Validated with £1K live capital for 2-4 weeks
- ✅ Capital doubled to £2K during validation period
- ✅ Zero stop-loss breaches during validation period (excluding black swan events)
- ✅ Audit trail complete and investor-ready
- ✅ System ready to scale to £10K capital allocation
3-Month Success (Post Phase 2-5):
- Operating at £10K capital with same exit quality metrics
- Clean track record of exit decisions with measurable outcomes
- Investor presentation materials complete with backtesting evidence
12-Month Vision:
- £100K+ capital with external investment
- Exit strategy proven across multiple market regimes
- Published track record of regime classification accuracy
- Multi-symbol support (beyond single grid)
Product Scope
MVP - Minimum Viable Product (Phases 2-5)
Core Exit Strategy:
- Exit state machine (WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
- Three-gate restart logic preventing premature grid restart
- Position risk quantification from KuCoin API
- Enhanced notifications with risk context
Quality & Validation:
- Comprehensive test coverage (60+ tests, 100% pass)
- Backtesting framework with 3-6 months historical validation
- CI/CD integration
Operational Foundation:
- Hourly evaluation cadence with monitoring
- Complete audit logging in Git
- KPI tracking framework
- Static HTML dashboards with Chart.js
Explicitly Out of Scope for MVP:
- Multi-symbol concurrent grids (single ETH-USDT only)
- Automated grid creation (human approval required)
- 15-minute evaluation cadence (hourly sufficient based on research)
- Advanced real-time dashboards
- Performance optimization beyond functional requirements
Post-MVP Growth Path
Detailed roadmap documented in Project Scoping section below, including:
- Phase 6 (3-month): Capital scaling to £10K with proven system
- Phase 7 (6-month): Investor preparation and multi-symbol validation
- Phase 8 (12-month+): Enhanced automation, ML refinements, multi-exchange support
User Journeys
Journey 1: Craig - Active Grid Trader (Exit Protection)
Situation: It’s Tuesday morning, 9:15 AM. Craig has an active ETH-USDT grid running with £1,200 capital deployed. The grid has been harvesting profitable oscillations for 3 days in a clean range between 3,200. His phone buzzes with a Pushover notification: “⚠️ WARNING - ETH regime transitioning. Confidence 0.68 → 0.54. Review recommended.”
Opening Scene - Warning Detection:
Craig opens the notification link on his phone. The decision interface shows:
- Current regime: TRANSITION (was RANGE_OK 15 minutes ago)
- Confidence: 0.54 (dropped from 0.68)
- Exit state: WARNING
- Key metrics: ADX rising (25 → 32), Bollinger Bandwidth expanding (0.034 → 0.041)
- Gate status: Gate 1 (Directional Energy Decay) FAILING - TrendScore crossed 35 threshold
- Time in WARNING: 15 minutes
- Estimated time to LATEST_ACCEPTABLE_EXIT: 1-2 hours
Craig thinks: “This is exactly what Phase 1 was built for - early warning before things get ugly.”
Rising Action - Monitoring Escalation:
45 minutes later, another notification: ”🔶 LATEST_ACCEPTABLE_EXIT - ETH trend strengthening. ADX 38, efficiency ratio 0.72. Exit recommended within 2 hours.”
Craig checks the decision record:
- Regime: TRANSITION → TREND (confirmed for 3/5 bars)
- ADX: 38 and rising
- Efficiency Ratio: 0.72 (directional persistence strong)
- Exit state: LATEST_ACCEPTABLE_EXIT
- Current position: Grid is net long 0.8 ETH (market moving up, sold into strength)
- Distance to stop-loss: $280 (still 8.7% buffer)
- Audit trail shows: “WARNING triggered at 09:15, LATEST_ACCEPTABLE_EXIT at 10:00”
Craig has a decision to make: Exit now with graceful unwinding, or wait and risk MANDATORY_EXIT?
Climax - Decisive Action:
Craig decides to exit. He manually stops the grid in KuCoin (3 clicks: Stop Grid → Keep Assets → Confirm). Within 2 minutes, the grid is stopped. Current PnL: +£47 profit on this grid session.
He updates the decision record via the web interface:
- Action taken: STOP_GRID
- Reason: “Trend confirmed, ADX rising through 35, efficiency ratio shows directional persistence”
- Outcome: Graceful exit with profit intact
The system records:
- Exit state progression: WARNING (09:15) → LATEST_ACCEPTABLE_EXIT (10:00) → USER_STOPPED (10:45)
- Total warning time: 90 minutes
- Stop-loss distance at exit: 8.7% (never threatened)
- Grid cooldown: 60 minutes before restart eligibility
Resolution - Post-Exit Validation:
Two hours later, ETH has moved to 3,280. The system would have marked this as “catastrophic exit avoided.”
24 hours later, the system performs automatic evaluation:
- Regime classification: CORRECT (remained TREND for 18 hours)
- Exit timing: OPTIMAL (exited 90 minutes into trend, avoided 8% adverse move)
- Warning lead time: 90 minutes (met success criteria: 30+ min warning)
- KPI recorded: Exit quality score 9/10 (early exit, preserved capital, clean audit trail)
Craig’s new reality:
- Capital preserved with profit
- Clean audit trail showing “system warned → I exited → trend confirmed”
- Confidence in system’s protective capability
- Restart gates now active: waiting for directional energy decay before re-entry
Journey 2: Craig - Historical Decision Reviewer (Investor Preparation)
Situation: It’s Friday evening, 3 months into validation. Craig is preparing materials for his personal decision to scale from £1K to £10K. He needs to demonstrate to himself that the exit system actually works before committing serious capital.
Opening Scene:
Craig opens the market-maker-data Git repository containing 3 months of immutable decision records. He runs the analysis script:
task analyze-exit-quality --period 2026-01-01 to 2026-03-31Rising Action:
The KPI dashboard generates:
- Total exit events: 12
- SLAR (Stop-Loss Avoidance Rate): 100% (12/12 exits before stop-loss)
- PRR (Profit Retention Ratio): 82% average (preserved £394 of £480 potential profit)
- TTDR (True Transition Detection): 83% (10/12 regime breaks correctly identified)
- FER (False Exit Rate): 17% (2/12 exits where range resumed after)
- Average warning time: 95 minutes (exceeds 30-minute minimum)
Climax:
Craig reviews the 2 false positives in detail:
- Exit #3 (Feb 12): WARNING → stopped grid → range resumed after 6 hours. Lost £18 in potential profit but preserved £67 existing profit. Restart gates prevented immediate re-entry; missed 2 days of ranging.
- Exit #7 (Mar 5): Similar pattern - cautious exit, range continued.
The audit trail shows his reasoning at the time: “ADX rising, efficiency ratio climbing, better safe than sorry.” Looking back, the system was correctly identifying volatility expansion, even though the regime ultimately held.
Resolution:
Craig’s conclusion: “2 false exits cost me £45 in missed profit. But the 10 true exits saved me from an estimated £620 in stop-loss hits. Net benefit: £575. More importantly - I can articulate WHY every decision was made, and the false positives were defensible given the data available.”
He updates his personal scaling decision document: “Exit system validated. Ready for £10K capital.”
Journey 3: External Investor - Track Record Evaluation
Situation: 18 months later. Craig is meeting with Sarah, an angel investor considering deploying £100K into his systematic grid trading fund. She’s reviewing his track record before committing capital.
Opening Scene:
Sarah receives access to Craig’s investor presentation repository. She’s evaluating whether this is “real systematic trading” or “lucky gambling with post-hoc justification.”
Rising Action:
Sarah reviews the evidence:
-
Immutable Decision Records (Git):
- Every recommendation timestamped and committed before action
- No retroactive editing (Git history proves it)
- Clear separation: “What did the system recommend?” vs “What did Craig do?”
-
Exit Quality Metrics (18 months):
- 87 total exit events
- SLAR: 97% (3 stop-loss hits during black swan events)
- PRR: 79% (preserved majority of range-trading profits)
- Monthly capital growth: 4.2% average (compounded)
-
Failure Analysis:
- Craig documents the 3 stop-loss hits:
- May 2026: Exchange outage prevented manual exit (system correctly identified MANDATORY_EXIT, Craig couldn’t execute)
- Aug 2026: “Ignored LATEST_ACCEPTABLE_EXIT recommendation - my mistake, learned lesson”
- Nov 2026: Flash crash exceeded all historical volatility bounds (unpredictable)
- Craig documents the 3 stop-loss hits:
Climax:
Sarah asks the critical question: “How do I know you didn’t just get lucky? What happens when regimes behave differently?”
Craig shows her the backtesting framework:
- Exit logic backtested against 3 years of historical data
- Would have avoided 23/27 major drawdown periods
- The 4 missed signals all occurred in low-liquidity Asian hours (now monitored)
Resolution:
Sarah’s conclusion: “This isn’t perfect, but it’s systematic, transparent, and learns from failures. The audit trail gives me confidence that capital is protected by process, not luck. I’m in.”
Journey 4: System Administrator - Deployment & Monitoring
Situation: Craig needs to deploy the Phase 2 restart gates logic to production after completing testing.
Opening Scene:
Craig (wearing his DevOps hat) reviews the deployment checklist:
- All tests passing (62 tests, 100% coverage)
- Backtesting complete
- Configuration updated with new gate thresholds
- Docker image built and pushed to registry
Rising Action:
He deploys using the standard workflow:
task deploy-metrics-service --env production
kubectl apply -f k8s/metrics-service/deployment.yamlThe ArgoCD pipeline automatically:
- Validates configuration schema
- Runs smoke tests against production API
- Gradually rolls out new pods (blue-green deployment)
- Monitors error rates and latency
Climax:
15 minutes after deployment, Craig receives a Slack alert: “Metrics service error rate: 0.2% (Gate evaluation failing for BTC-USDT)”
He checks the logs:
ERROR: Gate 1 evaluation failed - insufficient historical data for OU half-life calculation
Symbol: BTC-USDT, Required: 240 bars, Available: 187 bars
Resolution:
Craig realizes BTC-USDT is a newly added symbol without enough historical data. He updates the configuration to delay gate evaluation until sufficient data is collected:
grids:
- id: btc-grid-1
symbol: BTC-USDT
gate_evaluation_delay: 48h # Wait for data collectionRedeploys. Error rate returns to 0%. System is stable.
The incident is logged in the decision record system: “Deployment incident - insufficient data for new symbol gate evaluation. Resolution: delay gate evaluation. Prevention: add data sufficiency check to deployment validation.”
Journey 5: Kubernetes CronJob - Scheduled Evaluation
Situation: The regime evaluation system runs as a Kubernetes CronJob, executing hourly independently without external orchestration.
Opening Scene:
Every hour, Kubernetes triggers the metrics-service cronjob pod:
# k8s/metrics-service/cronjob.yaml
schedule: "0 * * * *"
command: ["task", "evaluate-regime"]Rising Action:
The cronjob pod executes:
- Reads configuration from environment variables (overriding
environment.yamldefaults) - Fetches latest market data from KuCoin API
- Calculates all 6 regime metrics (ADX, efficiency ratio, autocorrelation, OU half-life, slope, Bollinger bandwidth)
- Evaluates three restart gates (if grid is stopped)
- Classifies regime and determines exit state
- Creates decision record and commits to Git
- Sends notifications via configured channels (Pushover directly, or webhook to n8n if available)
Climax:
The evaluation detects a regime transition:
- Regime: RANGE_OK → TRANSITION
- Exit state: NORMAL → WARNING
- Decision record created:
decisions/2026-02-01/dec-eth-091500.yaml
The cronjob attempts to commit to Git repository:
git add decisions/2026-02-01/dec-eth-091500.yaml
git commit -m "[ETH-USDT] WARNING state detected - regime TRANSITION"
git push origin mainPotential Issue:
Git push fails (network timeout). The cronjob implements retry logic:
- Attempt 1: Failed (timeout)
- Attempt 2 (30s delay): Failed
- Attempt 3 (60s delay): Success
Decision record committed. Audit trail intact.
Resolution:
Notification sent via Pushover API (direct integration, no n8n dependency):
POST https://api.pushover.net/1/messages.json
{
"token": "...",
"user": "...",
"message": "⚠️ WARNING - ETH regime transitioning. Confidence 0.68 → 0.54",
"priority": 1,
"url": "https://regime-dashboard/decisions/dec-eth-091500"
}
Craig receives notification on phone. System continues evaluating hourly.
Operational Notes:
- Cronjob pod uses same Taskfile commands available locally:
task evaluate-regime - Configuration via environment variables:
MARKET_MAKER_DATA_REPOSITORY_BASE_PATH=/data/market-maker-data - Logs streamed to stdout, captured by Kubernetes logging
- Pod exits cleanly after each evaluation (stateless execution)
- Next evaluation triggered by Kubernetes scheduler in 1 hour
Journey Requirements Summary
These five journeys reveal the following capability requirements:
From Journey 1 (Active Trading):
- Hourly regime evaluation with exit state classification
- Exit state machine (NORMAL → WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
- Three-gate restart logic
- Push notifications with context
- Manual action recording
- Cooldown enforcement
From Journey 2 (Historical Review):
- KPI analysis framework (SLAR, PRR, TTDR, FER, ERT)
- Git-backed immutable decision records
- Analysis tooling (scripts, dashboards)
- Time-period filtering
- False positive/negative identification
- Audit trail completeness
From Journey 3 (Investor Evaluation):
- Investor-grade reporting
- Backtesting framework (3+ years historical data)
- Failure analysis documentation
- Separation of recommendation vs action
- Track record visualization
- Credibility evidence (immutability, transparency)
From Journey 4 (DevOps):
- Production deployment workflow
- Blue-green deployment support
- Error monitoring and alerting
- Configuration validation
- Data sufficiency checks
- Incident logging
- Rollback capability
From Journey 5 (Kubernetes CronJob):
- Kubernetes CronJob deployment support
- Taskfile-based execution (local simulation possible)
- Environment variable configuration override system
- Git commit retry logic with backoff
- Direct Pushover API integration (no n8n dependency initially)
- Stateless execution (each run independent)
- Kubernetes logging integration
- Graceful error handling and exit codes
- Configuration validation on startup
Optional n8n Integration (Growth Feature):
- Webhook endpoint for manual triggering
- n8n workflow orchestration for advanced notification routing
- Multi-channel notification distribution (Email, Slack, SMS via n8n)
Domain-Specific Requirements
Project Classification:
- Domain: Fintech - Algorithmic Trading
- Complexity: High
- Context: Brownfield (adding Phases 2-5 exit strategy to existing regime management system)
Compliance & Regulatory
Current Scope (Phases 2-5 - Personal Capital Trading):
- Personal capital trading (£1K-£10K scale) - no regulatory oversight required
- Regulatory compliance deferred to post-Phase 5 (external capital threshold)
- Git commit history provides sufficient audit integrity without independent verification or cryptographic signing
- Assumption: Personal capital trading does not trigger FCA algorithmic trading requirements (see RAIA log A006)
Out of Scope for MVP:
- FCA registration or compliance
- MiFID II algorithmic trading requirements
- External investor regulatory framework
- Legal review scheduled before £100K external capital raise
Security Architecture
API Security:
- KuCoin API keys with IP whitelist required + no-withdrawal permissions enforced
- Threat model: Prevent unauthorized trading and capital extraction
- Kubernetes secrets for sensitive configuration (not in code/config files)
Data Protection:
- Decision records repository: Private Git repository (
market-maker-data) - Access control: Restricted to operator only during validation phase
- Data in transit: Pushover notifications encrypted, HTTPS for all API calls
- GDPR: Personal trading data only (no third-party PII)
Decision Interface:
- Authentication: Not required for MVP (local K8s cluster + VPN access)
- Network isolation: Accessible only within VPN perimeter
- Future enhancement: OAuth2 ingress mechanism available for public exposure post-MVP
- Hosting: Private Kubernetes cluster (not public-facing)
Technical Constraints
Evaluation Cadence:
- MVP (Phases 2-5): 1-hour evaluation cycle (
schedule: "0 * * * *") - Rationale: Research indicates 12-24 hour warning window for regime transitions (see RAIA log A001, A004)
- Future enhancement: Adaptive cadence (state-based evaluation frequency) if validation shows need for faster response
- Action: Validate assumption via backtesting in Phase 4 (see RAIA Action 1)
Exchange Integration - KuCoin:
- Grid management limitation: KuCoin spot grids cannot be managed via API (manual stop/start via UI only)
- Human-in-loop requirement: System generates recommendations, human executes in KuCoin UI
- Data dependencies: Market data (OHLCV), account balance, position tracking all via KuCoin API
- Rate limits: 1-hour evaluation cycle well within KuCoin API rate limits
- API call volume: Reduced overhead compared to 15-minute cadence
Configuration Management:
- Schema validation: Configuration validated on startup with retry logic
- Deployment safety: Invalid configuration keeps previous deployment running (blue-green deployment)
- Environment overrides: Support environment variable overrides for Kubernetes deployment flexibility
- Validation checks: Pre-deployment validation catches configuration errors before production rollout
Data Availability:
- Historical data requirements: Sufficient data needed for gate evaluation (240+ bars for OU half-life)
- Data sufficiency checks: Validate sufficient data exists before enabling gate evaluation for new symbols
- Backfill support: Tools to collect historical data for new symbols before production use
Resilience & Failure Handling
Exchange Outage (Acceptable Risk):
- Scenario: KuCoin unavailable during MANDATORY_EXIT state
- Mitigation: Document as known limitation (see RAIA R002)
- Rationale: Manual execution dependency means system cannot auto-execute anyway
- Monitoring: Track exchange availability incidents for future multi-exchange planning (see RAIA Action 2)
Market Data Feed Failure (Retry with Backoff):
- Scenario: KuCoin market data API fails during evaluation cycle
- Mitigation: Retry 2-3 times with exponential backoff before declaring failure
- Failure handling: Log error, skip current cycle, attempt next cycle in 1 hour
- Alert threshold: After N consecutive failures, send “DATA UNAVAILABLE - MANUAL MONITORING REQUIRED” alert
- Rationale: Transient API issues shouldn’t trigger false alarms, but prolonged outage needs operator awareness
Git Commit Failure (Acceptable Risk with Logging):
- Scenario: Decision record created but Git push fails
- Mitigation: Log failure locally, continue operation (see RAIA R003)
- Rationale: Notification still delivered (Pushover), operator can act; audit gap is non-critical for validation phase
- Future enhancement: Retry queue for failed commits (post-Phase 5)
Configuration Errors (Validation with Rollback):
- Scenario: Invalid configuration deployed to production
- Mitigation:
- Pre-deployment: Schema validation in deployment pipeline
- Startup validation: Validate configuration on pod startup, retry with backoff if validation fails
- Deployment safety: Blue-green deployment keeps previous version running if new version fails validation
- Rationale: Configuration errors are preventable and should never reach production
Notification Delivery Failure:
- Scenario: Pushover API unavailable or rate-limited
- Mitigation: Log failure, attempt retry on next evaluation cycle
- Monitoring: Track notification delivery success rate
- Rationale: Missing single notification is acceptable if subsequent cycle succeeds
Integration Requirements
KuCoin Exchange API:
- Market data: OHLCV data at multiple timeframes (1m, 15m, 1h, 4h)
- Account data: Balance queries for capital allocation calculations
- Position tracking: Current grid status, order fills, PnL tracking
- Authentication: API key + secret + passphrase with IP whitelist
- Error handling: Graceful degradation on API failures, retry logic for transient errors
Git Repository (market-maker-data):
- Decision records: Immutable YAML files, one per recommendation
- Metrics history: Hourly snapshots of system state
- Commit strategy: Atomic commits with descriptive messages including symbol and state
- Push failures: Log and continue (acceptable gap in audit trail during outages)
- Access control: Private repository, SSH key authentication from Kubernetes pods
Pushover Notifications:
- Direct API integration: No n8n dependency for MVP
- Priority levels: NORMAL, WARNING, LATEST_ACCEPTABLE_EXIT, MANDATORY_EXIT map to Pushover priority
- Rate limiting: Prevent notification spam (max 1 notification per state transition)
- Delivery tracking: Log notification attempts and responses
Optional n8n Integration (Post-MVP):
- Webhook triggers: Manual evaluation triggering
- Multi-channel notifications: Email, Slack, SMS routing
- Workflow orchestration: Complex notification logic
Risk Mitigations
Domain-Specific Risks:
Fast Regime Transitions:
- Risk: Regime may transition faster than 1-hour evaluation cycle can detect (see RAIA R001)
- Mitigation:
- Backtesting to validate 12-24 hour warning window assumption (see RAIA Action 1)
- Monitor near-miss scenarios during validation
- Prepared to implement 15-minute cadence if needed
- Trigger: If >20% of regime transitions provide <2 hour warning window
Exchange Outage During Critical Exit:
- Risk: Cannot execute manual exit when KuCoin is unavailable (see RAIA R002)
- Mitigation: Accept as known limitation (manual execution dependency)
- Future: Multi-exchange diversification (post-Phase 5)
- Monitoring: Track incidents during validation (see RAIA Action 2)
API Rate Limiting:
- Risk: Excessive API calls trigger rate limits, blocking market data access
- Mitigation:
- 1-hour evaluation cycle well within KuCoin rate limits
- Retry logic with exponential backoff prevents rapid retry storms
- Monitor API usage to stay under limits
Data Staleness:
- Risk: Stale market data leads to incorrect regime classification (see RAIA R006)
- Mitigation:
- Timestamp all market data fetches
- Retry logic ensures fresh data attempts before failure
- Alert operator if data age exceeds acceptable threshold
Capital Loss from False Positives:
- Risk: Excessive false exits erode capital through missed ranging periods (see RAIA R004)
- Mitigation:
- Three-gate restart logic prevents premature re-entry
- Backtesting validates false positive rate <30% (see RAIA A005, Action 3)
- KPI tracking measures false exit impact
Regulatory Change:
- Risk: Crypto regulations change, grid trading becomes restricted (see RAIA R005)
- Mitigation: Monitor regulatory landscape, prepared to halt operations if needed
- Legal review: Scheduled before external capital raise (see RAIA Action 4)
Crypto Trading Domain Specifics
24/7 Market Operations:
- Implication: No market close, regime can shift anytime (overnight, weekends)
- Mitigation: 1-hour evaluation cycle runs continuously via Kubernetes CronJob
- Monitoring: System uptime monitoring, alert on CronJob failures
High Volatility Environment:
- Implication: Crypto moves faster than traditional markets, tighter response windows
- Mitigation: Gate thresholds calibrated for crypto volatility patterns (not traditional asset volatility)
- Validation: Backtesting with crypto-specific volatility scenarios (Phase 4)
Single Exchange Dependency (KuCoin):
- Risk: Exchange-specific outages, API changes, or policy changes affect operations (see RAIA I002)
- Mitigation: Accept as validation phase limitation
- Future: Multi-exchange architecture (post-Phase 5)
Grid Trading Mechanics:
- KuCoin limitation: Spot grids not manageable via API (manual UI interaction required) (see RAIA I001)
- Implication: System is decision support only, not automated execution
- Benefit: Human-in-loop preserves control, reduces regulatory complexity
Assumptions & Actions
Critical Assumptions Requiring Validation:
- A001: Regime transitions provide 12-24 hour warning windows → Validate in Phase 4 backtesting
- A004: 1-hour evaluation cadence sufficient for capital protection → Monitor during Phases 2-5
- A005: False positive rate <30% is acceptable → Measure via KPI framework
- A006: Personal capital trading exempt from FCA regulation → Legal review before £100K
Key Actions:
- Action 1: Validate 1-hour cadence assumption via backtesting (Phase 4, Due: 2026-04-01)
- Action 3: Measure false positive rate via KPI framework (Phase 4-5, Due: 2026-04-15)
- Action 5: Return to domain requirements after validation data available (Due: 2026-05-01)
- Action 6: Quarterly RAIA review (Next: 2026-05-01)
Full RAIA Log: See .ai/projects/market-making/RAIA.md for complete Risks, Assumptions, Issues, and Actions tracking.
Innovation & Novel Patterns
Detected Innovation Areas
1. Tiered Exit Urgency Model
Innovation: Progressive exit states with explicit time windows for human decision-making, replacing binary stop-loss logic.
Differentiator: Traditional grid trading uses binary stop-losses (triggered or not triggered). This system implements a tiered urgency model:
- WARNING: Early signal (2+ warning conditions met), 4-hour notification rate limit, provides 1-2 hour buffer to LATEST_ACCEPTABLE_EXIT
- LATEST_ACCEPTABLE_EXIT: Regime assumptions failing, 2-hour notification rate limit, recommended exit window of 4-8 hours
- MANDATORY_EXIT: Confirmed regime break, 1-hour notification rate limit, immediate exit recommended
Why This Matters: Provides graduated response time appropriate to signal strength. Users aren’t forced to choose between “no alert” or “emergency exit” - there are intermediate states that allow thoughtful decision-making while preserving capital protection.
Novel Aspect: Explicit modeling of decision urgency as progressive states with corresponding time buffers, rather than treating all exit signals as equivalent.
2. Sequential Three-Gate Restart Logic
Innovation: Post-exit restart requires sequential validation through three gates (not parallel checks), preventing premature re-entry during trend continuations.
Gate Structure:
- Gate 1 (Directional Energy Decay): Must pass FIRST - validates trend strength has subsided (ADX falling, TrendScore low, no persistent directional swings)
- Gate 2 (Mean Reversion Return): Evaluated ONLY after Gate 1 passes - validates mean-reverting behavior has returned (negative autocorrelation, short OU half-life, price oscillations reverting)
- Gate 3 (Tradable Volatility): Evaluated ONLY after Gate 2 passes - validates volatility is in tradable range (not too low, not expanding)
Differentiator: Traditional trading systems use simple cooldown periods (“don’t trade for N hours after stop”). This implements sequential validation - you can’t evaluate mean reversion until directional energy has decayed, you can’t evaluate volatility until mean reversion is confirmed.
Why This Matters: Prevents “stop-restart churn” where a grid is exited during a trend, then immediately re-entered before the trend fully resolves, leading to multiple stop-losses.
Novel Aspect: Sequential gating architecture (Gate N+1 only evaluated if Gate N passes) creates a forced progression through stability checks.
3. Multi-Metric Regime Consensus with 2+ Condition Triggering
Innovation: WARNING state requires 2+ warning conditions to trigger (not single condition), using consensus across 6 regime metrics.
Metrics Used:
- ADX (trend strength)
- Efficiency Ratio (directional persistence)
- Lag-1 Autocorrelation (mean reversion detection)
- OU Half-Life (mean reversion speed)
- Normalized Slope (directional bias)
- Bollinger Bandwidth (volatility regime)
Consensus Logic:
- Single warning condition = NORMAL state (no alert)
- 2+ warning conditions = WARNING state (alert sent)
- This prevents false alarms from single noisy indicators
Differentiator: Most technical analysis uses individual indicators or simple “AND” logic. This implements a voting mechanism - regime classification emerges from consensus, and WARNING requires multiple independent signals.
Why This Matters: Reduces false positive rate while maintaining sensitivity to genuine regime transitions. A single spike in ADX doesn’t trigger an alert, but ADX rising + confidence declining + efficiency ratio increasing = legitimate warning.
Novel Aspect: The explicit 2+ condition requirement to trigger WARNING, preventing single-indicator noise from generating actionable alerts.
4. Asymmetric Automation Philosophy
Innovation: System can automatically reduce risk (send alerts), but NEVER automatically deploys capital.
Design Principle:
- Auto-Alert, Manual-Execute: System generates exit recommendations 24/7, but human must execute in KuCoin UI
- Asymmetric Authority: System can escalate warnings (NORMAL → WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT) but cannot create grids or deploy capital without explicit approval
- Human-in-Loop by Design: Not an afterthought or “manual override” - it’s the core architecture
Differentiator: Most trading systems are either fully automated (system trades without human input) or fully manual (human monitors 24/7). This explicitly separates monitoring (automated) from execution (manual).
Why This Matters:
- Regulatory: Simpler compliance (no automated trading license needed)
- Risk: Capital deployment requires human judgment, reducing catastrophic automation failures
- Control: Operator maintains final authority while benefiting from 24/7 monitoring
Novel Aspect: The explicit articulation and implementation of “asymmetric automation” as a design philosophy, not just “we’ll add automation later.”
5. Investor-First Audit Trail Architecture
Innovation: Git-backed immutable decision records designed for investor scrutiny from day one (not added later).
Architecture:
- Every recommendation committed to Git BEFORE notification sent
- State transitions logged with timestamps, metrics, and reasoning
- Separation of “system recommendation” vs “user action” tracked independently
- No database, no retroactive editing - immutable audit trail via version control
Differentiator: Most trading systems add logging as an afterthought. This makes audit credibility a first-class design requirement, shaping the entire data architecture.
Why This Matters:
- Investor Credibility: Can answer “why didn’t you exit here?” for any historical moment
- Performance Analysis: Separate tracking of recommendation quality (was the system right?) vs action quality (did the operator follow advice?)
- Scaling Enabler: Clean audit trail is prerequisite for external capital (£100K+)
Novel Aspect: Using Git version control as the primary data store specifically for investor-credible audit trails, rather than traditional database logging.
Market Context & Competitive Landscape
Existing Approaches to Grid Exit:
-
Manual Monitoring: Trader watches markets 24/7, decides when to exit grids
- Limitation: Doesn’t scale, requires constant attention, subject to emotion/fatigue
-
Simple Stop-Loss: Set stop-loss at X% below grid range, exit when hit
- Limitation: Binary decision, often triggers at maximum loss, no early warning
-
Trailing Stops: Stop-loss moves with price, locks in some profit
- Limitation: Still binary, no regime awareness, can trigger during normal volatility
-
Automated Trading Bots: Fully automated grid management with various exit rules
- Limitation: Black-box decision-making, no human judgment, regulatory complexity
How This Differs:
This system combines:
- Regime structure analysis (not just price levels)
- Tiered urgency (not binary triggers)
- Multi-metric consensus (not single indicators)
- Human-in-loop (not fully automated)
- Sequential restart validation (not simple cooldowns)
- Investor-grade audit trails (not just operator logs)
Positioning: Structured decision support for systematic grid traders who want to scale capital while maintaining human judgment and building credible track records.
Validation Approach
Critical Questions to Answer:
Q1: Does tiered exit urgency preserve more capital than binary stop-losses?
- Validation Method: Backtesting (Phase 4) - compare tiered exit vs simple stop-loss on 3-6 months historical data
- Success Metric: 75%+ profit retention ratio (preserve majority of range-trading profits)
- Measure: Average exit timing (how early do we exit vs when stop-loss would have hit?)
Q2: Does the 2+ condition WARNING logic reduce false positives without missing real transitions?
- Validation Method: Track False Exit Rate (FER) during validation phase
- Success Metric: FER <30% (see RAIA A005)
- Measure: Exits where range resumed after stop vs exits where trend confirmed
Q3: Do sequential restart gates prevent stop-restart churn?
- Validation Method: Track re-entry timing after exits, measure stop-loss hits on restarted grids
- Success Metric: <10% of restarted grids hit stop-loss within 24 hours
- Measure: Time between exit and successful re-entry, profitability of restarted grids
Q4: Does 1-hour evaluation cadence provide sufficient warning time?
- Validation Method: Backtesting to measure actual regime transition warning windows (see RAIA A001, A004)
- Success Metric: ≥80% of transitions provide >2 hour warning window
- Measure: Time from WARNING to MANDATORY_EXIT in historical data
- Fallback: If <80%, implement 15-minute cadence or adaptive evaluation frequency
Q5: Does multi-metric consensus improve regime classification accuracy?
- Validation Method: Compare 6-metric consensus vs individual metrics
- Success Metric: Higher True Transition Detection Rate (TTDR) with consensus vs single indicators
- Measure: Regime classification accuracy in backtesting (correctly identified RANGE vs TREND)
Risk Mitigation
Innovation Risk 1: Excessive Complexity
- Risk: Tiered states, sequential gates, multi-metric consensus adds complexity that doesn’t improve outcomes vs simpler approaches
- Mitigation: Backtesting comparison against simpler baselines (binary stop-loss, single indicator, no gates)
- Fallback: If complex approach doesn’t outperform, simplify to best-performing baseline
- Validation Trigger: If backtesting shows <10% improvement vs simple stop-loss, question complexity
Innovation Risk 2: False Positive Rate Too High
- Risk: 2+ condition WARNING logic may still generate too many false exits (FER >30%)
- Mitigation: Tunable thresholds via YAML config, conservative/aggressive presets available
- Fallback: Increase WARNING requirement to 3+ conditions, or tighten individual condition thresholds
- Validation Trigger: Track FER in Phase 4, adjust thresholds if >30%
Innovation Risk 3: 1-Hour Cadence Insufficient
- Risk: Regime transitions may occur faster than 1-hour evaluation can detect (see RAIA R001)
- Mitigation: Backtesting measures actual warning windows in historical data
- Fallback: Implement 15-minute cadence or adaptive evaluation (NORMAL: 1h, WARNING: 15min, LATEST_ACCEPTABLE: 5min)
- Validation Trigger: If >20% of transitions provide <2 hour warning, implement faster cadence
Innovation Risk 4: Sequential Gates Too Restrictive
- Risk: Three-gate restart logic prevents timely re-entry, causing excessive opportunity cost
- Mitigation: Track time-to-restart and profitability of missed ranging periods
- Fallback: Parallel gate evaluation (all gates checked simultaneously) or reduce to 2 gates
- Validation Trigger: If average time-to-restart >48 hours and missed profit >20% of preserved capital
Innovation Risk 5: Human-in-Loop Execution Delay
- Risk: Manual execution introduces delay that negates early warning benefits
- Mitigation: Measure Exit Reaction Time (ERT) - time from alert to actual exit
- Fallback: If ERT consistently >30 minutes, consider API-based grid management (if KuCoin adds support) or multi-exchange architecture
- Validation Trigger: Track ERT in operational phase, identify if manual execution is bottleneck
Innovation Risk 6: Audit Trail Overhead
- Risk: Git commits for every decision create operational friction or repository bloat
- Mitigation: Lightweight JSON/YAML files, daily aggregation, automated cleanup for old data
- Fallback: Database logging with Git export for investor presentation
- Validation Trigger: If Git operations slow evaluation >500ms or repo size >1GB, reconsider architecture
Backend Decision Support System - Specific Requirements
Project-Type Overview
This is a batch processing system with Git-based persistence, not a web API. The system runs as a Kubernetes CronJob executing Python modules directly with file-based output to a Git repository mounted on a Persistent Volume Claim (PVC).
Architecture:
KuCoin API → Python Evaluation → Git Commit (PVC) → Static Dashboard Generation → Git Push
Key Characteristics:
- No HTTP API endpoints, no REST services, no client-server architecture
- Scheduled Python execution (hourly via Kubernetes CronJob)
- Git repository on PVC for persistence and retry capability
- Static HTML dashboards with Chart.js visualizations generated every hour
- Stateless job execution with all state loaded from/saved to Git
Data Pipeline
Processing Flow:
- Data Acquisition: Fetch OHLCV from KuCoin API, load recent metrics from Git PVC
- Regime Analysis: Calculate 6 metrics, classify regime, calculate confidence
- Exit State Evaluation: Evaluate WARNING/LATEST_ACCEPTABLE_EXIT/MANDATORY_EXIT conditions
- Gate Evaluation: If grid stopped, evaluate three sequential gates
- State Transition Tracking: Log state changes with rate limiting
- Decision Record Creation: Create immutable decision records
- Dashboard Generation: Generate HTML/JavaScript dashboard with Chart.js
- Data Persistence: Commit all files to Git (on PVC), push to remote with retry
No additional data transformation or aggregation stages for MVP - pipeline is complete as described.
Data Schemas: See SCHEMA.md for complete schema definitions (metrics, exit states, decision records, configuration).
Static Dashboard Generation
Execution: Dashboards generated as part of the same CronJob (not separate process)
Frequency: Every hour (regenerated with each evaluation)
Format: HTML with JavaScript charts (Chart.js library)
Structure: One dashboard HTML file per hour with embedded data for that evaluation period
- File naming:
dashboards/{symbol}/{YYYY-MM-DD}-{HH}.html - Self-contained: Data embedded in HTML (no external API calls)
- Viewable via:
file://protocol locally, or simple HTTP server, or Git hosting
Visualizations (Essential - support recommendations/decisions):
- Current regime classification and confidence
- Exit state (NORMAL/WARNING/LATEST_ACCEPTABLE_EXIT/MANDATORY_EXIT)
- All 6 metrics with current values and trends
- Gate evaluation status (if grid stopped)
- Recent state transition history
- Decision recommendation (if actionable)
Technology Stack:
- Chart.js for interactive visualizations
- HTML5/CSS3 for layout
- Embedded JSON data in
<script>tags - No server-side rendering needed
Error Handling & Resilience
KuCoin API Failures
Strategy: Retry 2-3 times with exponential backoff, then skip cycle
Acceptable for MVP: Yes
Persistent Failure Alerting: Yes - if API fails for multiple consecutive cycles (threshold: 3+ consecutive failures), send alert notification
Monitoring: Track API response times, error rates, success/failure counts → send to Grafana Loki for observability
Git Push Failures
Strategy: Log locally on PVC, continue operation (acceptable gap in audit trail for MVP)
PVC Design: Yes - CronJob should use PVC so Git repo persists between runs and doesn’t require full clone each time
Retry on Subsequent Cycles: Yes - if push failed previously, retry push on next cycle before committing new data
Implementation:
# Pseudo-code
if has_unpushed_commits():
try:
git.push()
logger.info("Pushed previously failed commits")
except:
logger.warning("Previous commits still not pushed")
# Continue with current evaluationMetric Calculation Errors
Strategy: Evaluation should continue with remaining metrics if one fails
Critical vs Optional: All metrics are conceptually critical, BUT:
- If a metric calculation fails (e.g., OU half-life non-stationary), continue with remaining metrics
- If enough metrics succeed to calculate confidence, generate recommendation WITH additional error information
- Include metric calculation errors in notification/dashboard
Implementation Approach:
- Calculate all metrics with error handling per metric
- Track which metrics succeeded vs failed
- If confidence can be calculated (even with partial metrics), proceed
- Include error context: “Recommendation based on 5/6 metrics (OU half-life calculation failed - data non-stationary)”
Error Notification: If confidence level is high enough for entry/exit recommendation, communicate the recommendation WITH error details about failed metrics
Configuration Validation Failures
Strategy: Fail fast on startup if config invalid
Acceptable: Yes - pod won’t start if config has errors
Blue-Green Deployment: Previous version should continue running if new version fails validation (Kubernetes deployment strategy)
Performance & Scalability
Processing Time Constraints
Maximum Acceptable Time: Not a concern functionally (even >1 hour would work), but:
- Error threshold: If evaluation takes >5 minutes, log ERROR (potential performance issue)
- Warning threshold: If evaluation takes >1 minute, log WARNING
- Target: Complete evaluation in <30 seconds for typical case
No specific hard performance requirements - hourly cadence provides plenty of buffer
Git Repository Size Management
Retention Policy: Keep all historical data forever for MVP
Cleanup: No automated deletion for MVP
Action Item: Create action in RAIA log to revisit data retention policy at end of MVP (after validation phase complete)
Current Assessment: Not a concern - estimate ~10-50 KB per evaluation × 24 hours × 365 days ≈ 87-438 MB/year (manageable)
Configuration Management
Hot-Reload
Not required: Configuration changes take effect on next CronJob execution (no need for hot-reload)
Acceptable Delay: Up to 1 hour between config change and effect (next hourly run)
Versioning
Primary Versioning: Git commit hash of config file (tracked in decision records)
Image Versioning: Docker image version stored in image metadata (immutable)
Enhancement: Consider writing Docker image version to output files alongside config Git hash
- Provides complete traceability: “This decision used config version X running on image version Y”
- Useful for debugging if image code changes behavior
Implementation:
# In metrics files
system_version:
config_git_hash: "a3f8d92e"
image_version: "v1.2.3" # From Docker image labelMonitoring & Observability
Metrics Collection: All metrics (Kubernetes pod metrics, application metrics, timing data) sent to Grafana Loki
Required Monitoring/Alerting:
- CronJob Execution Failures: Job didn’t run at expected time
- Evaluation Errors: Job ran but threw exceptions
- KuCoin API Degradation: High failure rate (3+ consecutive failures)
- Git Push Failures: Persistent issues (3+ consecutive push failures)
- Metric Calculation Anomalies: Values out of expected ranges or calculation failures
- Exit State Transitions: Log all WARNING/LATEST_ACCEPTABLE_EXIT/MANDATORY_EXIT transitions
- Performance Degradation: Evaluation taking >5 minutes (ERROR) or >1 minute (WARNING)
Raw Metrics to Loki:
- KuCoin API response times
- KuCoin API error counts and types
- CronJob execution duration
- Internal processing step timings (metric calculation, Git operations, dashboard generation)
- Errors and exceptions with full stack traces
Alert Channels:
- Pushover (direct API) for critical alerts
- Grafana for historical metrics and dashboards
- Optional: Webhook to n8n for advanced routing (future enhancement)
Data Retention & Cleanup
Current Policy: No automated cleanup for MVP
All Data Retained:
- Raw metrics: Forever
- Decision records: Forever (audit trail requirement)
- Exit state transitions: Forever
- Dashboards: Forever
Future Review: Action item in RAIA log to revisit retention policy after MVP validation phase
Technology Stack
Core Stack:
- Python 3.11+
- Pydantic for schema validation
- GitPython for Git operations
- PyYAML for YAML parsing
- Requests for KuCoin API calls
- Chart.js for dashboard visualizations
Additional Dependencies:
- Jinja2 (or similar) for HTML template rendering
- JSON for embedded data in dashboards
Deployment:
- Kubernetes CronJob
- PVC for Git repository persistence
- ConfigMap for exit_strategy_config.yaml (sourced from Git repo)
- ExternalSecrets for KuCoin API keys (central secret store)
Deployment & Operations
Kubernetes Deployment
Current Status: Already working in Kubernetes
CronJob Configuration:
- Schedule:
0 * * * *(hourly, on the hour) - PVC mount: Git repository persists between runs
- No need to clone repo each time
- Retry capability for failed Git pushes
PVC Design:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: market-maker-data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi # Adjust based on growthConfiguration Sources
Configuration managed via:
- Git Repository:
exit_strategy_config.yamlcommitted to Git, loaded from PVC - ExternalSecrets: KuCoin API keys from central secret store (already implemented)
- Environment Variables: Overrides for deployment-specific settings (data paths, logging levels)
No ConfigMap needed - configuration comes from Git repo on PVC
Configuration Flow:
- Configuration YAML committed to Git (workspace-root)
- Git repo cloned/updated on PVC by CronJob
- Python reads config from PVC path
- Config Git hash recorded in decision records
Implementation Considerations
File Organization:
repos/market-making/metrics-service/
├── src/
│ ├── regime/ # Regime detection (Phase 1 complete)
│ ├── exit_strategy/ # Exit strategy (Phase 2 target)
│ ├── schemas/ # Pydantic models (Phase 2)
│ ├── persistence/ # Git operations with retry (Phase 2)
│ ├── dashboards/ # Dashboard generation (Phase 5)
│ └── monitoring/ # Loki integration (Phase 5)
├── config/
│ └── exit_strategy_config.yaml
└── k8s/
├── cronjob.yaml
├── pvc.yaml
└── external-secrets.yaml
Git Operations with PVC:
- First run: Clone repository to PVC
- Subsequent runs:
git pullto update, commit new files, push with retry - Failed push: Accumulates commits on PVC, retries on next cycle
- PVC ensures no data loss even if push fails
Dashboard Generation:
- Generate HTML file per hour:
dashboards/{symbol}/{YYYY-MM-DD}-{HH}.html - Embed evaluation data as JSON in
<script>tag - Chart.js renders interactive charts client-side
- Self-contained files (no external API calls)
- Commit dashboards to Git for version control and distribution
Stateless Job Execution:
- CronJob pod starts, mounts PVC with Git repo
- Loads config and historical data from Git
- Performs evaluation
- Writes new files to Git (on PVC)
- Commits and pushes
- Generates dashboard
- Pod exits
- Next run starts fresh (but Git repo on PVC persists)
Project Scoping & Phased Development
MVP Strategy & Philosophy
MVP Approach: Validation-First Capital Protection System
This MVP follows a prove-before-scale philosophy. The system must demonstrate capital protection capability with £1K before committing £10K. The MVP is NOT “minimum features to launch” - it’s “minimum features to confidently scale capital.”
Why This Scope:
- Can’t skip validation: Backtesting + live testing required before £10K deployment
- Need visibility: Position risk quantification essential for informed exit decisions
- Must measure success: KPI tracking proves system works (not just feels right)
- Investor readiness: Complete audit trail + track record enables external capital (12-month vision)
Resource Requirements:
- Development: Solo developer (Craig) with AI assistance
- Capital: £1K validation → £10K scale → £100K+ external investment
- Timeline: 2-4 weeks validation after Phases 2-5 complete
- Infrastructure: Kubernetes cluster (already operational), KuCoin API access
What This MVP Proves:
- Exit strategy preserves capital during regime transitions (75%+ profit retention)
- System provides actionable warnings before catastrophic exits (95%+ stop-loss avoidance)
- False positive rate acceptable (<30% - not stopping grids unnecessarily)
- Human-in-loop execution viable (operator responds within acceptable windows)
- Audit trail sufficient for investor scrutiny
MVP Feature Set (Phases 2-5)
Current State (Phase 1 - COMPLETE):
- ✅ Six regime metrics operational (ADX, Efficiency Ratio, Autocorrelation, OU Half-Life, Normalized Slope, Bollinger Bandwidth)
- ✅ Regime classification working (RANGE_OK, RANGE_WEAK, TRANSITION, TREND)
- ✅ Git-backed storage with Kubernetes CronJob (hourly evaluation)
- ✅ Basic Pushover notifications functional
- ⚠️ Data Quality Issue: Hardcoded dummy values in engine.py must be fixed before Phases 2-5 (see implementation-plan.md Phase 1)
Core User Journeys Supported:
Journey 1: Active Grid Trader (Exit Protection)
- Real-time exit state evaluation (WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
- Push notifications with actionable recommendations
- Position risk visibility (capital at risk, profit give-back estimates)
- Manual exit execution with state tracking
Journey 2: Historical Decision Reviewer (Self-Validation)
- Git-backed immutable decision records
- KPI analysis framework (SLAR, PRR, TTDR, FER metrics)
- Backtesting framework showing system would have worked
- Track record for personal scaling decision
Journey 5: Kubernetes CronJob (Scheduled Evaluation)
- Stateless hourly execution with PVC-backed Git persistence
- Retry logic for Git push failures
- Direct Pushover API integration (no n8n dependency)
- Static HTML dashboard generation with Chart.js
Must-Have Capabilities:
Phase 2: Exit Strategy Core
- Exit State Machine: Progressive urgency states (NORMAL → WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
- Three-Gate Restart Logic: Sequential validation (Directional Energy Decay → Mean Reversion Return → Tradable Volatility)
- Multi-Condition Triggering: Require 2+ warning conditions to prevent false alarms
- State Transition Tracking: Git-logged transitions with timestamps and reasoning
- Historical Data Loading: Load last 12-24 hours of metrics for persistence checks
Trigger Logic Implemented:
- MANDATORY_EXIT: TREND regime detected, 2+ consecutive closes outside range, directional structure confirmed
- LATEST_ACCEPTABLE_EXIT: TRANSITION persists (≥2×4h OR ≥4×1h bars), OU half-life ≥2× baseline, volatility expansion >1.25×
- WARNING: 2+ conditions met (TRANSITION probability ≥40%, confidence declining, efficiency ratio rising, mean reversion slowing, volatility expanding)
Phase 3: Position Risk Quantification
- KuCoin Position Tracking: Fetch real-time position data via API
- Capital Risk Calculator: Quantify capital at risk, profit give-back estimates, stop-loss distance in ATR
- Enhanced Notifications: All exit state alerts include position risk context
- Graceful Degradation: System continues if KuCoin API unavailable (uses last known positions)
Notification Enhancements:
- WARNING: “Capital at risk: $120.50, Review within 24h”
- LATEST_ACCEPTABLE_EXIT: “Expected give-back if delayed 12h: $4-7, Exit within 4-12h”
- MANDATORY_EXIT: “Stop-loss distance: 0.6 ATR (CRITICAL), Exit NOW”
Phase 4: Testing & Validation
- Unit Tests: 60+ tests covering metric calculations, exit triggers, state transitions
- Integration Tests: End-to-end flow (regime → exit state → notification → Git commit)
- Backtesting Framework: Replay historical metrics (3-6 months data), validate exit quality
- CI/CD Pipeline: GitHub Actions with quality gates (80%+ coverage, all tests pass)
Backtesting Success Criteria:
- Profit Retention Ratio ≥75% (preserved majority of range profits)
- Stop-Loss Avoidance Rate ≥95% (exited before stop-loss in 95%+ scenarios)
- False Exit Rate ≤30% (acceptable false positive rate)
- Average warning lead time ≥30 minutes (met timing requirements)
Phase 5: Operational Foundation
- Evaluation Cadence: 1-hour CronJob execution (
0 * * * *) - matches 12-24h warning window assumption - Audit Logging: Complete Git-backed state transitions, notification delivery tracking, operator action recording
- KPI Tracking Framework: Calculate SLAR, PRR, TTDR, FER, MEC metrics from audit logs
- Static Dashboards: HTML/Chart.js visualizations generated hourly, committed to Git
- Monitoring Integration: All metrics/logs sent to Grafana Loki for observability
Dashboard Visualizations:
- Current regime classification and confidence score
- Exit state (NORMAL/WARNING/LATEST_ACCEPTABLE_EXIT/MANDATORY_EXIT)
- All 6 metrics with current values and trends
- Gate evaluation status (if grid stopped)
- Recent state transition history
Out of Scope for MVP
Explicitly NOT Included:
Multi-Symbol Support:
- MVP: Single ETH-USDT grid only (SINGLE_GRID mode)
- Rationale: Prove exit strategy works for one symbol before scaling
- Future: Multi-symbol portfolio management (post-MVP growth feature)
Automated Grid Creation:
- MVP: Human approval required for all grid starts
- Rationale: Preserve human judgment, reduce regulatory complexity
- Future: Automated creation with high-confidence thresholds (post-MVP)
Advanced Dashboards:
- MVP: Static HTML/Chart.js files generated hourly
- Rationale: Sufficient for validation phase, investor-presentable
- Future: Real-time interactive dashboards, performance attribution analysis (post-MVP)
15-Minute Evaluation Cadence:
- MVP: 1-hour evaluation cycle
- Rationale: Research indicates 12-24h warning windows (RAIA A001, A004) - hourly sufficient
- Future: Adaptive cadence (state-based frequency) if validation shows need (post-MVP)
Automated Cleanup:
- MVP: Keep all data forever (no retention policy)
- Rationale: Preserve complete audit trail for validation analysis
- Future: Revisit after MVP complete (RAIA action item)
Multi-Exchange Support:
- MVP: KuCoin only
- Rationale: Single exchange simplifies integration, acceptable for validation
- Future: Multi-exchange diversification reduces outage risk (post-MVP)
Performance Optimization:
- MVP: Functional performance (evaluation <5 minutes acceptable)
- Rationale: 1-hour cadence provides plenty of buffer
- Future: Caching, async processing if needed (post-MVP)
Post-MVP Roadmap
Phase 6: Capital Scaling (3-Month Horizon)
Objective: Operate at £10K capital with proven exit strategy
Prerequisites:
- MVP validation complete (2-4 weeks live operation with £1K)
- Capital doubled to £2K during validation
- Zero stop-loss breaches during validation period
- KPIs meet targets (SLAR ≥95%, PRR ≥75%, TTDR ≥70%)
Enhancements:
- Track record documentation for personal scaling decision
- Threshold tuning based on real performance data
- KPI trend analysis (monthly reports)
Timeline: Month 4-6 after MVP complete
Phase 7: Investor Preparation (6-Month Horizon)
Objective: Package track record for external capital raise (£100K+)
Prerequisites:
- 3+ months operation at £10K capital
- Consistent monthly capital growth (4%+ average)
- Clean failure analysis documentation
- Backtesting validated against 3+ years historical data
Deliverables:
- Investor Presentation: Track record visualization, backtesting evidence, failure analysis
- Separation of Concerns: “System recommendation quality” vs “Operator action quality” metrics
- Regulatory Review: Legal assessment before external capital (RAIA A006)
- Multi-Symbol Validation: Expand beyond ETH-USDT, prove approach generalizes
Timeline: Month 7-12 after MVP complete
Phase 8: Growth Features (12-Month+ Horizon)
Objective: Scale operations with enhanced automation and intelligence
Enhanced Automation:
- Automated grid creation with high-confidence thresholds (human override available)
- Multi-symbol portfolio management (concurrent grids across symbols)
- Dynamic capital allocation based on regime confidence
Analytics & Reporting:
- Real-time visual dashboards (replace static HTML)
- Automated investor reports (monthly performance summaries)
- Performance attribution analysis (which decisions drove returns)
- Regime classification accuracy tracking (learn from misclassifications)
Intelligence Enhancements:
- Machine learning for regime classification refinement (adaptive to market structure changes)
- Adaptive gate thresholds based on market conditions (not static YAML config)
- Predictive exit timing optimization (earlier warnings for faster regime transitions)
Risk Management Expansion:
- Portfolio-level risk limits (not just per-grid)
- Correlation analysis across symbols (avoid concentrated exposure)
- Multi-exchange support (KuCoin + Binance + others for outage protection)
Timeline: Month 13+ after MVP complete
Progressive Feature Roadmap Summary
MVP (Phases 2-5): Capital Protection Foundation
- Exit strategy + validation + operational foundation
- £1K validation → confident £10K scale
- 2-4 weeks live operation
- Done When: KPIs proven, audit trail complete, zero stop-loss breaches
Phase 6 (Post-MVP): Capital Scaling
- Operate at £10K with proven system
- 3 months track record building
- Done When: Consistent 4%+ monthly growth, ready for investor presentation
Phase 7 (6-Month): Investor Readiness
- Multi-symbol validation
- External capital preparation (£100K+)
- Done When: Investor presentation complete, regulatory review done
Phase 8 (12-Month+): Growth & Intelligence
- Enhanced automation (within asymmetric philosophy)
- ML-based refinements
- Multi-exchange portfolio management
- Done When: Operating at £100K+ scale with external investment
Risk Mitigation Strategy
Technical Risks:
Innovation Risk 1: 1-Hour Cadence Insufficient
- Risk: Regime transitions may occur faster than hourly evaluation can detect (RAIA R001)
- Mitigation: Backtesting validates actual warning windows in historical data (Phase 4)
- Fallback: Implement 15-minute cadence if >20% of transitions provide <2h warning
- Validation Trigger: Monitor during Phases 2-5, measure warning lead times in KPI framework
Innovation Risk 2: False Exit Rate Too High
- Risk: 2+ condition WARNING logic may still generate excessive false exits (FER >30%)
- Mitigation: Tunable thresholds via YAML config, conservative/aggressive presets
- Fallback: Increase WARNING requirement to 3+ conditions, or tighten individual thresholds
- Validation Trigger: Track FER in Phase 4 backtesting, adjust before live deployment
Innovation Risk 3: Sequential Gates Too Restrictive
- Risk: Three-gate restart logic prevents timely re-entry, excessive opportunity cost
- Mitigation: Track time-to-restart and profitability of missed ranging periods (KPI framework)
- Fallback: Parallel gate evaluation or reduce to 2 gates
- Validation Trigger: If average time-to-restart >48h AND missed profit >20% of preserved capital
Market Risks:
Fast Regime Transitions (RAIA R001)
- Risk: Market moves faster than 1-hour cycle can detect, insufficient warning time
- Mitigation:
- Backtesting validates 12-24h warning window assumption (RAIA Action 1)
- Monitor near-miss scenarios during validation
- Prepared to implement 15-minute cadence if needed
- Trigger: If >20% of transitions provide <2h warning window
Exchange Outage During Critical Exit (RAIA R002)
- Risk: Cannot execute manual exit when KuCoin unavailable during MANDATORY_EXIT
- Mitigation: Accept as known limitation (manual execution dependency)
- Future: Multi-exchange diversification (Phase 8)
- Monitoring: Track incidents during validation (RAIA Action 2)
Capital Loss from False Positives (RAIA R004)
- Risk: Excessive false exits erode capital through missed ranging periods
- Mitigation:
- Three-gate restart logic prevents premature re-entry
- Backtesting validates FER <30% (RAIA A005, Action 3)
- KPI tracking measures false exit impact
- Trigger: If FER >30% in backtesting, tighten WARNING thresholds
Resource Risks:
Data Quality Issues Block Progress
- Risk: Phase 1 hardcoded dummy values must be fixed before Phases 2-5 trustworthy
- Mitigation: Phase 1 prioritized, 40-60 hours estimated (see implementation-plan.md)
- Status: In progress (ADX complete, 11% of Phase 1 done)
- Contingency: Allocate 20% buffer time for unexpected data issues
Testing Reveals Major Bugs
- Risk: Phase 4 backtesting shows exit logic fundamentally flawed
- Mitigation: Test early (consider Phase 4 before Phases 2-3), iterate on thresholds
- Fallback: Simplify trigger logic (remove complex conditions), use proven baselines
- Contingency: Budget 50% additional time if major redesign needed
Scope Validation & Constraints
What Makes This the Right MVP:
Can Validate Core Value Proposition:
- Exit strategy proven to preserve capital (backtesting + live testing)
- Tiered urgency model tested (WARNING → LATEST_ACCEPTABLE → MANDATORY progression)
- Sequential gates validated (prevents premature re-entry)
- Human-in-loop execution proven viable (operator can respond in time)
Can Make Confident Scaling Decision:
- KPI framework provides objective success measures (SLAR, PRR, TTDR)
- Audit trail shows “did system work?” vs “did I follow advice?”
- Backtesting + 2-4 weeks live operation = sufficient confidence for £10K
- Track record foundation for future investor presentation
Can Be Completed in Reasonable Timeframe:
- Phase 1: 2-3 weeks (data quality fix)
- Phases 2-5: 4-6 weeks (exit strategy + validation + operational)
- Total: 6-9 weeks development + 2-4 weeks validation = 2-3 months to “MVP Done”
Boundaries Tested:
✅ Could validate without Phase 3 (Position Risk)? NO
- Need “capital at risk: $120” visibility for informed exit decisions
- Essential for £10K scale confidence
- Position risk quantification is must-have
✅ Could validate with basic text dashboards (no Chart.js)? NO
- User explicitly requires charts for regime trend assessment
- Visual confirmation of exit state transitions aids decision-making
- Chart.js is lightweight, not over-engineering
✅ Could validate without backtesting (Phase 4)? NO
- Can’t trust exit logic without historical validation
- Need objective proof of 75% profit retention, 95% stop-loss avoidance
- De-risks £10K capital deployment
- Backtesting is must-have
✅ Could simplify Phase 5 (Operational)? YES - Potential optimization
- Could defer fancy KPI dashboards (manual calculation acceptable)
- 1-hour cadence already correct (not 15-min)
- Simple YAML audit logs sufficient initially (enhance later)
- Simplification Opportunity: Streamline Phase 5 to basic logging + manual KPIs
Phase Sequence Validation:
Current Plan: Phase 1 → 2 → 3 → 4 → 5 (sequential)
Alternative Considered: Phase 1 → 4 → 2 → 3 → 5 (backtest-first)
- Benefit: Validate exit logic via backtesting BEFORE building Phases 2-3
- Risk: Delays getting operational system, harder to iterate without working code
- Decision: Keep current sequence (2→3→4) for faster feedback loop, but Phase 4 can start in parallel with Phase 3
Recommended Optimization:
- Phase 1: Data Quality (BLOCKER - must complete first)
- Phase 2 + Phase 4 (partial): Build exit strategy WHILE creating backtesting framework
- Phase 3: Position Risk (can parallelize with Phase 4 backtesting)
- Phase 4 (complete): Validate everything before deployment
- Phase 5: Operational polish
Success Criteria (MVP “Done”)
Completion Criteria (All Must Be Met):
✅ Code Complete:
- All Phase 2-5 code implemented with 100% test pass rate
- No critical bugs, no hardcoded dummy values
- Configuration complete and validated
✅ Backtesting Validation (Phase 4):
- Exit logic tested against 3-6 months historical data
- Profit Retention Ratio ≥75%
- Stop-Loss Avoidance Rate ≥95%
- False Exit Rate ≤30%
- Average warning lead time ≥30 minutes
✅ Live Capital Validation (2-4 Weeks):
- Operated with £1K live capital for 2-4 weeks
- Experienced multiple regime cycles (at least 2-3 TRANSITION events)
- Zero stop-loss breaches during validation period (excluding black swan events)
- KPIs meet targets in live operation (not just backtesting)
✅ Capital Scaling Milestone:
- Capital doubled from £1K to £2K during validation period
- Proves system protects capital WHILE capturing ranging profits
- Demonstrates profitability, not just capital preservation
✅ Audit Trail Complete:
- All decision records committed to Git with timestamps
- State transitions logged with reasoning and metrics
- Can answer “why didn’t you exit here?” for any historical moment
- Separation of system recommendations vs operator actions tracked
✅ System Ready for £10K:
- Risk calculations scale correctly (position sizing, stop-loss placement)
- Position tracking handles larger capital amounts
- Notification system tested and reliable
- Operator confident in decision-making process
MVP Declared “Done” When: All six completion criteria met + personal decision: “I’m ready to deploy £10K confidently.”
3-Month Success (Post Phase 2-5):
- Operating at £10K capital with same exit quality metrics
- Consistent monthly growth (4%+ average)
- Clean track record of exit decisions with measurable outcomes
- Investor presentation materials ready (if pursuing external capital)
12-Month Vision:
- £100K+ capital with external investment
- Exit strategy proven across multiple market regimes (bull, bear, ranging, volatile)
- Published track record of regime classification accuracy
- Multi-symbol support (beyond single ETH-USDT grid)
Functional Requirements
Regime Analysis & Classification
FR1: System can fetch OHLCV market data from exchange API
FR2: System can calculate six regime metrics (ADX, Efficiency Ratio, Autocorrelation, OU Half-Life, Normalized Slope, Bollinger Bandwidth)
FR3: System can classify market regime into four states (RANGE_OK, RANGE_WEAK, TRANSITION, TREND)
FR4: System can calculate regime confidence score
FR5: System can persist regime analysis results to version-controlled storage
FR6: System can load historical regime analysis for trend evaluation
Exit Strategy Management
FR7: System can evaluate current exit state based on regime analysis (NORMAL, WARNING, LATEST_ACCEPTABLE_EXIT, MANDATORY_EXIT)
FR8: System can detect MANDATORY_EXIT conditions (TREND regime, consecutive closes outside range, directional structure confirmed)
FR9: System can detect LATEST_ACCEPTABLE_EXIT conditions (TRANSITION persistence, mean reversion degradation, volatility expansion)
FR10: System can detect WARNING conditions requiring 2+ triggering metrics
FR11: System can track exit state transitions with timestamps and reasons
FR12: System can evaluate three sequential restart gates (Directional Energy Decay, Mean Reversion Return, Tradable Volatility)
FR13: System can enforce gate sequencing (Gate N+1 only evaluated if Gate N passes)
FR14: System can track gate status history for stopped grids
FR15: System can determine grid eligibility for restart based on gate progression
Risk Assessment
FR16: System can fetch active position data from exchange API
FR17: System can calculate unrealized PnL for active positions
FR18: System can calculate capital at risk based on current positions and stop-loss distance
FR19: System can estimate profit give-back if exit delayed by specified hours
FR20: System can calculate stop-loss distance in ATR units
FR21: System can track grid position health relative to configured boundaries
FR22: System can gracefully degrade when position data unavailable (use last known state)
Notification & Alerting
FR23: Operator can receive exit state notifications via push notification service
FR24: System can rate-limit notifications based on exit state urgency (WARNING: 4h, LATEST_ACCEPTABLE: 2h, MANDATORY: 1h)
FR25: System can include position risk context in notifications (capital at risk, profit give-back, stop-loss distance)
FR26: System can include regime metrics in notifications (confidence, verdict, triggering conditions)
FR27: System can track notification delivery status (sent, delivered, failed)
FR28: System can prevent duplicate notifications for unchanged exit states
Audit & Decision Tracking
FR29: System can create immutable decision records with timestamps, regime state, and exit recommendations
FR30: System can commit decision records to version-controlled storage before sending notifications
FR31: System can track configuration version (Git hash) used for each decision
FR32: System can track system image version used for each decision
FR33: Operator can query historical decision records by date range, symbol, or exit state
FR34: System can track operator actions (grid stopped, grid started, exit declined) separately from system recommendations
FR35: System can maintain separation between “system recommendation quality” and “operator action quality”
FR36: System can provide complete audit trail for investor scrutiny
Validation & Analysis
FR37: System can replay historical metrics for backtesting exit strategy
FR38: System can calculate Profit Retention Ratio (PRR) from historical data
FR39: System can calculate Stop-Loss Avoidance Rate (SLAR) from historical data
FR40: System can calculate True Transition Detection Rate (TTDR) from historical data
FR41: System can calculate False Exit Rate (FER) from historical data
FR42: System can calculate Exit Reaction Time (ERT) when operator action data available
FR43: System can generate KPI reports for specified time periods
FR44: System can identify false positive exits (regime returned to RANGE after exit)
FR45: System can identify false negative exits (regime transitioned but no exit signal)
FR46: Operator can compare backtesting results against live operation results
System Operations
FR47: System can execute regime evaluation on scheduled intervals (hourly)
FR48: System can validate configuration schema on startup
FR49: System can retry failed Git push operations on subsequent evaluation cycles
FR50: System can generate static HTML dashboards with embedded visualizations
FR51: System can track evaluation execution time and log performance warnings
FR52: System can send operational metrics to logging infrastructure (Grafana Loki)
FR53: System can handle partial metric calculation failures (continue with available metrics)
FR54: System can log metric calculation errors in decision records
FR55: Operator can override configuration via environment variables
FR56: System can detect and alert on persistent API failures (3+ consecutive failures)
Non-Functional Requirements
Performance
NFR-P1: Regime evaluation completes within 5 minutes (allows 55-minute buffer before next hourly cycle)
NFR-P2: Evaluation time exceeding 1 minute triggers WARNING log entry
NFR-P3: Evaluation time exceeding 5 minutes triggers ERROR log entry and operator alert
NFR-P4: Notification delivery latency <60 seconds from decision record creation
NFR-P5: Git commit and push operations complete within 10 seconds under normal conditions
NFR-P6: Historical data loading (12-24 hours of metrics) completes within 30 seconds
Reliability
NFR-R1: System availability ≥99% during validation phase (acceptable: ~7 hours downtime per month)
NFR-R2: CronJob execution success rate ≥98% (missed evaluations acceptable if isolated)
NFR-R3: Failed Git push operations retry automatically on subsequent evaluation cycles
NFR-R4: Failed KuCoin API calls retry up to 3 times with exponential backoff before declaring failure
NFR-R5: System continues exit state evaluation when position data unavailable (graceful degradation)
NFR-R6: Persistent failures (3+ consecutive cycles) trigger operator alerts
NFR-R7: Configuration errors detected on startup prevent deployment (fail-fast with rollback to previous version)
Security
NFR-S1: Exchange API keys stored in external secrets management (not in code/config files)
NFR-S2: API keys restricted with IP whitelist and no-withdrawal permissions
NFR-S3: Decision records stored in private Git repository with access limited to operator
NFR-S4: All API communications use HTTPS/TLS encryption
NFR-S5: Pushover notifications encrypted in transit
NFR-S6: No credentials or API keys logged in application logs or decision records
NFR-S7: Git repository authentication uses SSH keys (not HTTPS credentials)
Integration
NFR-I1: System tolerates KuCoin API response times up to 5 seconds (retry if exceeded)
NFR-I2: KuCoin API rate limits are not exceeded (hourly cadence well within limits)
NFR-I3: Pushover API failures do not block evaluation completion
NFR-I4: Git push failures do not prevent decision record creation (stored locally, pushed later)
NFR-I5: System handles KuCoin API maintenance windows gracefully (uses last known data, alerts operator)
NFR-I6: Failed notification delivery tracked and retried on next evaluation cycle
NFR-I7: API integration errors include actionable context (error type, retry count, next action)
Data Integrity
NFR-D1: Decision records are immutable once committed to Git (no retroactive editing)
NFR-D2: All decision records include Git commit hash for configuration version traceability
NFR-D3: All decision records include Docker image version for code traceability
NFR-D4: Metric calculation errors logged in decision records (transparent failure tracking)
NFR-D5: Partial metric failures documented with specific metrics unavailable
NFR-D6: Timestamp precision to the second for all decision records and state transitions
NFR-D7: Operator actions tracked separately from system recommendations (no conflation)
NFR-D8: Historical data retained indefinitely for MVP (no automated deletion)