Product Requirements Document - Grid Exit Strategy - Phases 2-5

Author: Craig Date: 2026-02-01

Success Criteria

User Success (You as Grid Trader)

Decision Confidence:

You can articulate WHY you entered and exited every position using audit trail evidence
You have at least 30 minutes warning time between exit state transitions in 90%+ of cases
You can answer “why didn’t you exit here?” for any historical moment using immutable decision records

Capital Protection:

Zero stop-loss breaches during normal market conditions (excluding “world-defining moments”)
System provides WARNING state at least 1-2 hours before LATEST_ACCEPTABLE_EXIT
At least 2 hours between LATEST_ACCEPTABLE_EXIT and MANDATORY_EXIT states
No catastrophic exits (defined as: hitting exchange stop-loss instead of graceful exit)

Operational Clarity:

Exit state transitions are clear and actionable (you know what WARNING/LATEST_ACCEPTABLE/MANDATORY mean in real-time)
System evaluates regime hourly with consistent decision logic
Restart gates prevent premature re-entry after trend stops

Business Success (Capital Scaling & Investor Readiness)

Capital Scaling Milestone:

Double capital stake from £1K to £2K within Phase 2-5 validation period (2-4 weeks live operation)
System proven ready to support £10K capital allocation (risk calculations, position sizing, audit trails all scale)

Investor Credibility:

Complete immutable audit trail in Git showing every decision with timestamps
Backtesting results demonstrate exit strategy would have prevented historical drawdowns
Ability to generate “decision quality” reports showing regime classification accuracy vs outcomes
Clean separation of “recommendation quality” (was regime correct?) vs “action quality” (did I follow the recommendation?)

Exit Quality Metrics:

KPI framework operational and tracking exit quality (how early did we exit vs when we should have?)
Historical analysis showing system identified trend breakouts before significant capital loss
Documented evidence of false-positive rate (stopped grids that stayed range-bound)

Technical Success

Phase 2 Complete:

Three-gate restart logic implemented and tested (Directional Energy Decay → Mean Reversion Return → Tradable Volatility)
Exit state transitions functional (WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
State transition tracking in decision records
Historical data loading supports gate evaluation

Phase 3 Complete:

KuCoin position tracker integrated and returning accurate position data
Capital risk calculator quantifying exposure in real-time
Enhanced notifications include risk metrics (current exposure, distance to stop-loss, time in exit state)

Phase 4 Complete:

100% test coverage for new exit logic (matching Phase 1 quality: 60+ tests, all passing)
Backtesting framework operational and validated against 3-6 months of historical data
CI/CD integration preventing regression
Documented test scenarios covering edge cases (volatility spikes, gap moves, data failures)

Phase 5 Complete:

Hourly evaluation cadence operational with monitoring
Audit logging captures all state transitions with context (regime metrics, confidence scores, gate status)
KPI tracking framework operational
Documentation complete for investor presentation

Measurable Outcomes

Completion Criteria (Phases 2-5 “Done”):

✅ All code implemented with 100% test pass rate
✅ Backtested against historical trend breakouts (3-6 months data)
✅ Validated with £1K live capital for 2-4 weeks
✅ Capital doubled to £2K during validation period
✅ Zero stop-loss breaches during validation period (excluding black swan events)
✅ Audit trail complete and investor-ready
✅ System ready to scale to £10K capital allocation

3-Month Success (Post Phase 2-5):

Operating at £10K capital with same exit quality metrics
Clean track record of exit decisions with measurable outcomes
Investor presentation materials complete with backtesting evidence

12-Month Vision:

£100K+ capital with external investment
Exit strategy proven across multiple market regimes
Published track record of regime classification accuracy
Multi-symbol support (beyond single grid)

Product Scope

MVP - Minimum Viable Product (Phases 2-5)

Core Exit Strategy:

Exit state machine (WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
Three-gate restart logic preventing premature grid restart
Position risk quantification from KuCoin API
Enhanced notifications with risk context

Quality & Validation:

Comprehensive test coverage (60+ tests, 100% pass)
Backtesting framework with 3-6 months historical validation
CI/CD integration

Operational Foundation:

Hourly evaluation cadence with monitoring
Complete audit logging in Git
KPI tracking framework
Static HTML dashboards with Chart.js

Explicitly Out of Scope for MVP:

Multi-symbol concurrent grids (single ETH-USDT only)
Automated grid creation (human approval required)
15-minute evaluation cadence (hourly sufficient based on research)
Advanced real-time dashboards
Performance optimization beyond functional requirements

Post-MVP Growth Path

Detailed roadmap documented in Project Scoping section below, including:

Phase 6 (3-month): Capital scaling to £10K with proven system
Phase 7 (6-month): Investor preparation and multi-symbol validation
Phase 8 (12-month+): Enhanced automation, ML refinements, multi-exchange support

User Journeys

Journey 1: Craig - Active Grid Trader (Exit Protection)

Situation: It’s Tuesday morning, 9:15 AM. Craig has an active ETH-USDT grid running with £1,200 capital deployed. The grid has been harvesting profitable oscillations for 3 days in a clean range between $2, 900 -$ 3,200. His phone buzzes with a Pushover notification: “⚠️ WARNING - ETH regime transitioning. Confidence 0.68 → 0.54. Review recommended.”

Opening Scene - Warning Detection:

Craig opens the notification link on his phone. The decision interface shows:

Current regime: TRANSITION (was RANGE_OK 15 minutes ago)
Confidence: 0.54 (dropped from 0.68)
Exit state: WARNING
Key metrics: ADX rising (25 → 32), Bollinger Bandwidth expanding (0.034 → 0.041)
Gate status: Gate 1 (Directional Energy Decay) FAILING - TrendScore crossed 35 threshold
Time in WARNING: 15 minutes
Estimated time to LATEST_ACCEPTABLE_EXIT: 1-2 hours

Craig thinks: “This is exactly what Phase 1 was built for - early warning before things get ugly.”

Rising Action - Monitoring Escalation:

45 minutes later, another notification: ”🔶 LATEST_ACCEPTABLE_EXIT - ETH trend strengthening. ADX 38, efficiency ratio 0.72. Exit recommended within 2 hours.”

Craig checks the decision record:

Regime: TRANSITION → TREND (confirmed for 3/5 bars)
ADX: 38 and rising
Efficiency Ratio: 0.72 (directional persistence strong)
Exit state: LATEST_ACCEPTABLE_EXIT
Current position: Grid is net long 0.8 ETH (market moving up, sold into strength)
Distance to stop-loss: $280 (still 8.7% buffer)
Audit trail shows: “WARNING triggered at 09:15, LATEST_ACCEPTABLE_EXIT at 10:00”

Craig has a decision to make: Exit now with graceful unwinding, or wait and risk MANDATORY_EXIT?

Climax - Decisive Action:

Craig decides to exit. He manually stops the grid in KuCoin (3 clicks: Stop Grid → Keep Assets → Confirm). Within 2 minutes, the grid is stopped. Current PnL: +£47 profit on this grid session.

He updates the decision record via the web interface:

Action taken: STOP_GRID
Reason: “Trend confirmed, ADX rising through 35, efficiency ratio shows directional persistence”
Outcome: Graceful exit with profit intact

The system records:

Exit state progression: WARNING (09:15) → LATEST_ACCEPTABLE_EXIT (10:00) → USER_STOPPED (10:45)
Total warning time: 90 minutes
Stop-loss distance at exit: 8.7% (never threatened)
Grid cooldown: 60 minutes before restart eligibility

Resolution - Post-Exit Validation:

Two hours later, ETH has moved to $3, 350 (+ 4.7$ 3,280. The system would have marked this as “catastrophic exit avoided.”

24 hours later, the system performs automatic evaluation:

Regime classification: CORRECT (remained TREND for 18 hours)
Exit timing: OPTIMAL (exited 90 minutes into trend, avoided 8% adverse move)
Warning lead time: 90 minutes (met success criteria: 30+ min warning)
KPI recorded: Exit quality score 9/10 (early exit, preserved capital, clean audit trail)

Craig’s new reality:

Capital preserved with profit
Clean audit trail showing “system warned → I exited → trend confirmed”
Confidence in system’s protective capability
Restart gates now active: waiting for directional energy decay before re-entry

Journey 2: Craig - Historical Decision Reviewer (Investor Preparation)

Situation: It’s Friday evening, 3 months into validation. Craig is preparing materials for his personal decision to scale from £1K to £10K. He needs to demonstrate to himself that the exit system actually works before committing serious capital.

Opening Scene:

Craig opens the market-maker-data Git repository containing 3 months of immutable decision records. He runs the analysis script:

task analyze-exit-quality --period 2026-01-01 to 2026-03-31

Rising Action:

The KPI dashboard generates:

Total exit events: 12
SLAR (Stop-Loss Avoidance Rate): 100% (12/12 exits before stop-loss)
PRR (Profit Retention Ratio): 82% average (preserved £394 of £480 potential profit)
TTDR (True Transition Detection): 83% (10/12 regime breaks correctly identified)
FER (False Exit Rate): 17% (2/12 exits where range resumed after)
Average warning time: 95 minutes (exceeds 30-minute minimum)

Climax:

Craig reviews the 2 false positives in detail:

Exit #3 (Feb 12): WARNING → stopped grid → range resumed after 6 hours. Lost £18 in potential profit but preserved £67 existing profit. Restart gates prevented immediate re-entry; missed 2 days of ranging.
Exit #7 (Mar 5): Similar pattern - cautious exit, range continued.

The audit trail shows his reasoning at the time: “ADX rising, efficiency ratio climbing, better safe than sorry.” Looking back, the system was correctly identifying volatility expansion, even though the regime ultimately held.

Resolution:

Craig’s conclusion: “2 false exits cost me £45 in missed profit. But the 10 true exits saved me from an estimated £620 in stop-loss hits. Net benefit: £575. More importantly - I can articulate WHY every decision was made, and the false positives were defensible given the data available.”

He updates his personal scaling decision document: “Exit system validated. Ready for £10K capital.”

Journey 3: External Investor - Track Record Evaluation

Situation: 18 months later. Craig is meeting with Sarah, an angel investor considering deploying £100K into his systematic grid trading fund. She’s reviewing his track record before committing capital.

Opening Scene:

Sarah receives access to Craig’s investor presentation repository. She’s evaluating whether this is “real systematic trading” or “lucky gambling with post-hoc justification.”

Rising Action:

Sarah reviews the evidence:

Immutable Decision Records (Git):
- Every recommendation timestamped and committed before action
- No retroactive editing (Git history proves it)
- Clear separation: “What did the system recommend?” vs “What did Craig do?”
Exit Quality Metrics (18 months):
- 87 total exit events
- SLAR: 97% (3 stop-loss hits during black swan events)
- PRR: 79% (preserved majority of range-trading profits)
- Monthly capital growth: 4.2% average (compounded)
Failure Analysis:
- Craig documents the 3 stop-loss hits:
  - May 2026: Exchange outage prevented manual exit (system correctly identified MANDATORY_EXIT, Craig couldn’t execute)
  - Aug 2026: “Ignored LATEST_ACCEPTABLE_EXIT recommendation - my mistake, learned lesson”
  - Nov 2026: Flash crash exceeded all historical volatility bounds (unpredictable)

Climax:

Sarah asks the critical question: “How do I know you didn’t just get lucky? What happens when regimes behave differently?”

Craig shows her the backtesting framework:

Exit logic backtested against 3 years of historical data
Would have avoided 23/27 major drawdown periods
The 4 missed signals all occurred in low-liquidity Asian hours (now monitored)

Resolution:

Sarah’s conclusion: “This isn’t perfect, but it’s systematic, transparent, and learns from failures. The audit trail gives me confidence that capital is protected by process, not luck. I’m in.”

Journey 4: System Administrator - Deployment & Monitoring

Situation: Craig needs to deploy the Phase 2 restart gates logic to production after completing testing.

Opening Scene:

Craig (wearing his DevOps hat) reviews the deployment checklist:

All tests passing (62 tests, 100% coverage)
Backtesting complete
Configuration updated with new gate thresholds
Docker image built and pushed to registry

Rising Action:

He deploys using the standard workflow:

task deploy-metrics-service --env production
kubectl apply -f k8s/metrics-service/deployment.yaml

The ArgoCD pipeline automatically:

Validates configuration schema
Runs smoke tests against production API
Gradually rolls out new pods (blue-green deployment)
Monitors error rates and latency

Climax:

15 minutes after deployment, Craig receives a Slack alert: “Metrics service error rate: 0.2% (Gate evaluation failing for BTC-USDT)”

He checks the logs:

ERROR: Gate 1 evaluation failed - insufficient historical data for OU half-life calculation
Symbol: BTC-USDT, Required: 240 bars, Available: 187 bars

Resolution:

Craig realizes BTC-USDT is a newly added symbol without enough historical data. He updates the configuration to delay gate evaluation until sufficient data is collected:

grids:
  - id: btc-grid-1
    symbol: BTC-USDT
    gate_evaluation_delay: 48h  # Wait for data collection

Redeploys. Error rate returns to 0%. System is stable.

The incident is logged in the decision record system: “Deployment incident - insufficient data for new symbol gate evaluation. Resolution: delay gate evaluation. Prevention: add data sufficiency check to deployment validation.”

Journey 5: Kubernetes CronJob - Scheduled Evaluation

Situation: The regime evaluation system runs as a Kubernetes CronJob, executing hourly independently without external orchestration.

Opening Scene:

Every hour, Kubernetes triggers the metrics-service cronjob pod:

# k8s/metrics-service/cronjob.yaml
schedule: "0 * * * *"
command: ["task", "evaluate-regime"]

Rising Action:

The cronjob pod executes:

Reads configuration from environment variables (overriding environment.yaml defaults)
Fetches latest market data from KuCoin API
Calculates all 6 regime metrics (ADX, efficiency ratio, autocorrelation, OU half-life, slope, Bollinger bandwidth)
Evaluates three restart gates (if grid is stopped)
Classifies regime and determines exit state
Creates decision record and commits to Git
Sends notifications via configured channels (Pushover directly, or webhook to n8n if available)

Climax:

The evaluation detects a regime transition:

Regime: RANGE_OK → TRANSITION
Exit state: NORMAL → WARNING
Decision record created: decisions/2026-02-01/dec-eth-091500.yaml

The cronjob attempts to commit to Git repository:

git add decisions/2026-02-01/dec-eth-091500.yaml
git commit -m "[ETH-USDT] WARNING state detected - regime TRANSITION"
git push origin main

Potential Issue:

Git push fails (network timeout). The cronjob implements retry logic:

Attempt 1: Failed (timeout)
Attempt 2 (30s delay): Failed
Attempt 3 (60s delay): Success

Decision record committed. Audit trail intact.

Resolution:

Notification sent via Pushover API (direct integration, no n8n dependency):

POST https://api.pushover.net/1/messages.json
{
  "token": "...",
  "user": "...",
  "message": "⚠️ WARNING - ETH regime transitioning. Confidence 0.68 → 0.54",
  "priority": 1,
  "url": "https://regime-dashboard/decisions/dec-eth-091500"
}

Craig receives notification on phone. System continues evaluating hourly.

Operational Notes:

Cronjob pod uses same Taskfile commands available locally: task evaluate-regime
Configuration via environment variables: MARKET_MAKER_DATA_REPOSITORY_BASE_PATH=/data/market-maker-data
Logs streamed to stdout, captured by Kubernetes logging
Pod exits cleanly after each evaluation (stateless execution)
Next evaluation triggered by Kubernetes scheduler in 1 hour

Journey Requirements Summary

These five journeys reveal the following capability requirements:

From Journey 1 (Active Trading):

Hourly regime evaluation with exit state classification
Exit state machine (NORMAL → WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
Three-gate restart logic
Push notifications with context
Manual action recording
Cooldown enforcement

From Journey 2 (Historical Review):

KPI analysis framework (SLAR, PRR, TTDR, FER, ERT)
Git-backed immutable decision records
Analysis tooling (scripts, dashboards)
Time-period filtering
False positive/negative identification
Audit trail completeness

From Journey 3 (Investor Evaluation):

Investor-grade reporting
Backtesting framework (3+ years historical data)
Failure analysis documentation
Separation of recommendation vs action
Track record visualization
Credibility evidence (immutability, transparency)

From Journey 4 (DevOps):

Production deployment workflow
Blue-green deployment support
Error monitoring and alerting
Configuration validation
Data sufficiency checks
Incident logging
Rollback capability

From Journey 5 (Kubernetes CronJob):

Kubernetes CronJob deployment support
Taskfile-based execution (local simulation possible)
Environment variable configuration override system
Git commit retry logic with backoff
Direct Pushover API integration (no n8n dependency initially)
Stateless execution (each run independent)
Kubernetes logging integration
Graceful error handling and exit codes
Configuration validation on startup

Optional n8n Integration (Growth Feature):

Webhook endpoint for manual triggering
n8n workflow orchestration for advanced notification routing
Multi-channel notification distribution (Email, Slack, SMS via n8n)

Domain-Specific Requirements

Project Classification:

Domain: Fintech - Algorithmic Trading
Complexity: High
Context: Brownfield (adding Phases 2-5 exit strategy to existing regime management system)

Compliance & Regulatory

Current Scope (Phases 2-5 - Personal Capital Trading):

Personal capital trading (£1K-£10K scale) - no regulatory oversight required
Regulatory compliance deferred to post-Phase 5 (external capital threshold)
Git commit history provides sufficient audit integrity without independent verification or cryptographic signing
Assumption: Personal capital trading does not trigger FCA algorithmic trading requirements (see RAIA log A006)

Out of Scope for MVP:

FCA registration or compliance
MiFID II algorithmic trading requirements
External investor regulatory framework
Legal review scheduled before £100K external capital raise

Security Architecture

API Security:

KuCoin API keys with IP whitelist required + no-withdrawal permissions enforced
Threat model: Prevent unauthorized trading and capital extraction
Kubernetes secrets for sensitive configuration (not in code/config files)

Data Protection:

Decision records repository: Private Git repository (market-maker-data)
Access control: Restricted to operator only during validation phase
Data in transit: Pushover notifications encrypted, HTTPS for all API calls
GDPR: Personal trading data only (no third-party PII)

Decision Interface:

Authentication: Not required for MVP (local K8s cluster + VPN access)
Network isolation: Accessible only within VPN perimeter
Future enhancement: OAuth2 ingress mechanism available for public exposure post-MVP
Hosting: Private Kubernetes cluster (not public-facing)

Technical Constraints

Evaluation Cadence:

MVP (Phases 2-5): 1-hour evaluation cycle (schedule: "0 * * * *")
Rationale: Research indicates 12-24 hour warning window for regime transitions (see RAIA log A001, A004)
Future enhancement: Adaptive cadence (state-based evaluation frequency) if validation shows need for faster response
Action: Validate assumption via backtesting in Phase 4 (see RAIA Action 1)

Exchange Integration - KuCoin:

Grid management limitation: KuCoin spot grids cannot be managed via API (manual stop/start via UI only)
Human-in-loop requirement: System generates recommendations, human executes in KuCoin UI
Data dependencies: Market data (OHLCV), account balance, position tracking all via KuCoin API
Rate limits: 1-hour evaluation cycle well within KuCoin API rate limits
API call volume: Reduced overhead compared to 15-minute cadence

Configuration Management:

Schema validation: Configuration validated on startup with retry logic
Deployment safety: Invalid configuration keeps previous deployment running (blue-green deployment)
Environment overrides: Support environment variable overrides for Kubernetes deployment flexibility
Validation checks: Pre-deployment validation catches configuration errors before production rollout

Data Availability:

Historical data requirements: Sufficient data needed for gate evaluation (240+ bars for OU half-life)
Data sufficiency checks: Validate sufficient data exists before enabling gate evaluation for new symbols
Backfill support: Tools to collect historical data for new symbols before production use

Resilience & Failure Handling

Exchange Outage (Acceptable Risk):

Scenario: KuCoin unavailable during MANDATORY_EXIT state
Mitigation: Document as known limitation (see RAIA R002)
Rationale: Manual execution dependency means system cannot auto-execute anyway
Monitoring: Track exchange availability incidents for future multi-exchange planning (see RAIA Action 2)

Market Data Feed Failure (Retry with Backoff):

Scenario: KuCoin market data API fails during evaluation cycle
Mitigation: Retry 2-3 times with exponential backoff before declaring failure
Failure handling: Log error, skip current cycle, attempt next cycle in 1 hour
Alert threshold: After N consecutive failures, send “DATA UNAVAILABLE - MANUAL MONITORING REQUIRED” alert
Rationale: Transient API issues shouldn’t trigger false alarms, but prolonged outage needs operator awareness

Git Commit Failure (Acceptable Risk with Logging):

Scenario: Decision record created but Git push fails
Mitigation: Log failure locally, continue operation (see RAIA R003)
Rationale: Notification still delivered (Pushover), operator can act; audit gap is non-critical for validation phase
Future enhancement: Retry queue for failed commits (post-Phase 5)

Configuration Errors (Validation with Rollback):

Scenario: Invalid configuration deployed to production
Mitigation:
- Pre-deployment: Schema validation in deployment pipeline
- Startup validation: Validate configuration on pod startup, retry with backoff if validation fails
- Deployment safety: Blue-green deployment keeps previous version running if new version fails validation
Rationale: Configuration errors are preventable and should never reach production

Notification Delivery Failure:

Scenario: Pushover API unavailable or rate-limited
Mitigation: Log failure, attempt retry on next evaluation cycle
Monitoring: Track notification delivery success rate
Rationale: Missing single notification is acceptable if subsequent cycle succeeds

Integration Requirements

KuCoin Exchange API:

Market data: OHLCV data at multiple timeframes (1m, 15m, 1h, 4h)
Account data: Balance queries for capital allocation calculations
Position tracking: Current grid status, order fills, PnL tracking
Authentication: API key + secret + passphrase with IP whitelist
Error handling: Graceful degradation on API failures, retry logic for transient errors

Git Repository (market-maker-data):

Decision records: Immutable YAML files, one per recommendation
Metrics history: Hourly snapshots of system state
Commit strategy: Atomic commits with descriptive messages including symbol and state
Push failures: Log and continue (acceptable gap in audit trail during outages)
Access control: Private repository, SSH key authentication from Kubernetes pods

Pushover Notifications:

Direct API integration: No n8n dependency for MVP
Priority levels: NORMAL, WARNING, LATEST_ACCEPTABLE_EXIT, MANDATORY_EXIT map to Pushover priority
Rate limiting: Prevent notification spam (max 1 notification per state transition)
Delivery tracking: Log notification attempts and responses

Optional n8n Integration (Post-MVP):

Webhook triggers: Manual evaluation triggering
Multi-channel notifications: Email, Slack, SMS routing
Workflow orchestration: Complex notification logic

Risk Mitigations

Domain-Specific Risks:

Fast Regime Transitions:

Risk: Regime may transition faster than 1-hour evaluation cycle can detect (see RAIA R001)
Mitigation:
- Backtesting to validate 12-24 hour warning window assumption (see RAIA Action 1)
- Monitor near-miss scenarios during validation
- Prepared to implement 15-minute cadence if needed
Trigger: If >20% of regime transitions provide <2 hour warning window

Exchange Outage During Critical Exit:

Risk: Cannot execute manual exit when KuCoin is unavailable (see RAIA R002)
Mitigation: Accept as known limitation (manual execution dependency)
Future: Multi-exchange diversification (post-Phase 5)
Monitoring: Track incidents during validation (see RAIA Action 2)

API Rate Limiting:

Risk: Excessive API calls trigger rate limits, blocking market data access
Mitigation:
- 1-hour evaluation cycle well within KuCoin rate limits
- Retry logic with exponential backoff prevents rapid retry storms
- Monitor API usage to stay under limits

Data Staleness:

Risk: Stale market data leads to incorrect regime classification (see RAIA R006)
Mitigation:
- Timestamp all market data fetches
- Retry logic ensures fresh data attempts before failure
- Alert operator if data age exceeds acceptable threshold

Capital Loss from False Positives:

Risk: Excessive false exits erode capital through missed ranging periods (see RAIA R004)
Mitigation:
- Three-gate restart logic prevents premature re-entry
- Backtesting validates false positive rate <30% (see RAIA A005, Action 3)
- KPI tracking measures false exit impact

Regulatory Change:

Risk: Crypto regulations change, grid trading becomes restricted (see RAIA R005)
Mitigation: Monitor regulatory landscape, prepared to halt operations if needed
Legal review: Scheduled before external capital raise (see RAIA Action 4)

Crypto Trading Domain Specifics

24/7 Market Operations:

Implication: No market close, regime can shift anytime (overnight, weekends)
Mitigation: 1-hour evaluation cycle runs continuously via Kubernetes CronJob
Monitoring: System uptime monitoring, alert on CronJob failures

High Volatility Environment:

Implication: Crypto moves faster than traditional markets, tighter response windows
Mitigation: Gate thresholds calibrated for crypto volatility patterns (not traditional asset volatility)
Validation: Backtesting with crypto-specific volatility scenarios (Phase 4)

Single Exchange Dependency (KuCoin):

Risk: Exchange-specific outages, API changes, or policy changes affect operations (see RAIA I002)
Mitigation: Accept as validation phase limitation
Future: Multi-exchange architecture (post-Phase 5)

Grid Trading Mechanics:

KuCoin limitation: Spot grids not manageable via API (manual UI interaction required) (see RAIA I001)
Implication: System is decision support only, not automated execution
Benefit: Human-in-loop preserves control, reduces regulatory complexity

Assumptions & Actions

Critical Assumptions Requiring Validation:

A001: Regime transitions provide 12-24 hour warning windows → Validate in Phase 4 backtesting
A004: 1-hour evaluation cadence sufficient for capital protection → Monitor during Phases 2-5
A005: False positive rate <30% is acceptable → Measure via KPI framework
A006: Personal capital trading exempt from FCA regulation → Legal review before £100K

Key Actions:

Action 1: Validate 1-hour cadence assumption via backtesting (Phase 4, Due: 2026-04-01)
Action 3: Measure false positive rate via KPI framework (Phase 4-5, Due: 2026-04-15)
Action 5: Return to domain requirements after validation data available (Due: 2026-05-01)
Action 6: Quarterly RAIA review (Next: 2026-05-01)

Full RAIA Log: See .ai/projects/market-making/RAIA.md for complete Risks, Assumptions, Issues, and Actions tracking.

Innovation & Novel Patterns

Detected Innovation Areas

1. Tiered Exit Urgency Model

Innovation: Progressive exit states with explicit time windows for human decision-making, replacing binary stop-loss logic.

Differentiator: Traditional grid trading uses binary stop-losses (triggered or not triggered). This system implements a tiered urgency model:

WARNING: Early signal (2+ warning conditions met), 4-hour notification rate limit, provides 1-2 hour buffer to LATEST_ACCEPTABLE_EXIT
LATEST_ACCEPTABLE_EXIT: Regime assumptions failing, 2-hour notification rate limit, recommended exit window of 4-8 hours
MANDATORY_EXIT: Confirmed regime break, 1-hour notification rate limit, immediate exit recommended

Why This Matters: Provides graduated response time appropriate to signal strength. Users aren’t forced to choose between “no alert” or “emergency exit” - there are intermediate states that allow thoughtful decision-making while preserving capital protection.

Novel Aspect: Explicit modeling of decision urgency as progressive states with corresponding time buffers, rather than treating all exit signals as equivalent.

2. Sequential Three-Gate Restart Logic

Innovation: Post-exit restart requires sequential validation through three gates (not parallel checks), preventing premature re-entry during trend continuations.

Gate Structure:

Gate 1 (Directional Energy Decay): Must pass FIRST - validates trend strength has subsided (ADX falling, TrendScore low, no persistent directional swings)
Gate 2 (Mean Reversion Return): Evaluated ONLY after Gate 1 passes - validates mean-reverting behavior has returned (negative autocorrelation, short OU half-life, price oscillations reverting)
Gate 3 (Tradable Volatility): Evaluated ONLY after Gate 2 passes - validates volatility is in tradable range (not too low, not expanding)

Differentiator: Traditional trading systems use simple cooldown periods (“don’t trade for N hours after stop”). This implements sequential validation - you can’t evaluate mean reversion until directional energy has decayed, you can’t evaluate volatility until mean reversion is confirmed.

Why This Matters: Prevents “stop-restart churn” where a grid is exited during a trend, then immediately re-entered before the trend fully resolves, leading to multiple stop-losses.

Novel Aspect: Sequential gating architecture (Gate N+1 only evaluated if Gate N passes) creates a forced progression through stability checks.

3. Multi-Metric Regime Consensus with 2+ Condition Triggering

Innovation: WARNING state requires 2+ warning conditions to trigger (not single condition), using consensus across 6 regime metrics.

Metrics Used:

ADX (trend strength)
Efficiency Ratio (directional persistence)
Lag-1 Autocorrelation (mean reversion detection)
OU Half-Life (mean reversion speed)
Normalized Slope (directional bias)
Bollinger Bandwidth (volatility regime)

Consensus Logic:

Single warning condition = NORMAL state (no alert)
2+ warning conditions = WARNING state (alert sent)
This prevents false alarms from single noisy indicators

Differentiator: Most technical analysis uses individual indicators or simple “AND” logic. This implements a voting mechanism - regime classification emerges from consensus, and WARNING requires multiple independent signals.

Why This Matters: Reduces false positive rate while maintaining sensitivity to genuine regime transitions. A single spike in ADX doesn’t trigger an alert, but ADX rising + confidence declining + efficiency ratio increasing = legitimate warning.

Novel Aspect: The explicit 2+ condition requirement to trigger WARNING, preventing single-indicator noise from generating actionable alerts.

4. Asymmetric Automation Philosophy

Innovation: System can automatically reduce risk (send alerts), but NEVER automatically deploys capital.

Design Principle:

Auto-Alert, Manual-Execute: System generates exit recommendations 24/7, but human must execute in KuCoin UI
Asymmetric Authority: System can escalate warnings (NORMAL → WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT) but cannot create grids or deploy capital without explicit approval
Human-in-Loop by Design: Not an afterthought or “manual override” - it’s the core architecture

Differentiator: Most trading systems are either fully automated (system trades without human input) or fully manual (human monitors 24/7). This explicitly separates monitoring (automated) from execution (manual).

Why This Matters:

Regulatory: Simpler compliance (no automated trading license needed)
Risk: Capital deployment requires human judgment, reducing catastrophic automation failures
Control: Operator maintains final authority while benefiting from 24/7 monitoring

Novel Aspect: The explicit articulation and implementation of “asymmetric automation” as a design philosophy, not just “we’ll add automation later.”

5. Investor-First Audit Trail Architecture

Innovation: Git-backed immutable decision records designed for investor scrutiny from day one (not added later).

Architecture:

Every recommendation committed to Git BEFORE notification sent
State transitions logged with timestamps, metrics, and reasoning
Separation of “system recommendation” vs “user action” tracked independently
No database, no retroactive editing - immutable audit trail via version control

Differentiator: Most trading systems add logging as an afterthought. This makes audit credibility a first-class design requirement, shaping the entire data architecture.

Why This Matters:

Investor Credibility: Can answer “why didn’t you exit here?” for any historical moment
Performance Analysis: Separate tracking of recommendation quality (was the system right?) vs action quality (did the operator follow advice?)
Scaling Enabler: Clean audit trail is prerequisite for external capital (£100K+)

Novel Aspect: Using Git version control as the primary data store specifically for investor-credible audit trails, rather than traditional database logging.

Market Context & Competitive Landscape

Existing Approaches to Grid Exit:

Manual Monitoring: Trader watches markets 24/7, decides when to exit grids
- Limitation: Doesn’t scale, requires constant attention, subject to emotion/fatigue
Simple Stop-Loss: Set stop-loss at X% below grid range, exit when hit
- Limitation: Binary decision, often triggers at maximum loss, no early warning
Trailing Stops: Stop-loss moves with price, locks in some profit
- Limitation: Still binary, no regime awareness, can trigger during normal volatility
Automated Trading Bots: Fully automated grid management with various exit rules
- Limitation: Black-box decision-making, no human judgment, regulatory complexity

How This Differs:

This system combines:

Regime structure analysis (not just price levels)
Tiered urgency (not binary triggers)
Multi-metric consensus (not single indicators)
Human-in-loop (not fully automated)
Sequential restart validation (not simple cooldowns)
Investor-grade audit trails (not just operator logs)

Positioning: Structured decision support for systematic grid traders who want to scale capital while maintaining human judgment and building credible track records.

Validation Approach

Critical Questions to Answer:

Q1: Does tiered exit urgency preserve more capital than binary stop-losses?

Validation Method: Backtesting (Phase 4) - compare tiered exit vs simple stop-loss on 3-6 months historical data
Success Metric: 75%+ profit retention ratio (preserve majority of range-trading profits)
Measure: Average exit timing (how early do we exit vs when stop-loss would have hit?)

Q2: Does the 2+ condition WARNING logic reduce false positives without missing real transitions?

Validation Method: Track False Exit Rate (FER) during validation phase
Success Metric: FER <30% (see RAIA A005)
Measure: Exits where range resumed after stop vs exits where trend confirmed

Q3: Do sequential restart gates prevent stop-restart churn?

Validation Method: Track re-entry timing after exits, measure stop-loss hits on restarted grids
Success Metric: <10% of restarted grids hit stop-loss within 24 hours
Measure: Time between exit and successful re-entry, profitability of restarted grids

Q4: Does 1-hour evaluation cadence provide sufficient warning time?

Validation Method: Backtesting to measure actual regime transition warning windows (see RAIA A001, A004)
Success Metric: ≥80% of transitions provide >2 hour warning window
Measure: Time from WARNING to MANDATORY_EXIT in historical data
Fallback: If <80%, implement 15-minute cadence or adaptive evaluation frequency

Q5: Does multi-metric consensus improve regime classification accuracy?

Validation Method: Compare 6-metric consensus vs individual metrics
Success Metric: Higher True Transition Detection Rate (TTDR) with consensus vs single indicators
Measure: Regime classification accuracy in backtesting (correctly identified RANGE vs TREND)

Risk Mitigation

Innovation Risk 1: Excessive Complexity

Risk: Tiered states, sequential gates, multi-metric consensus adds complexity that doesn’t improve outcomes vs simpler approaches
Mitigation: Backtesting comparison against simpler baselines (binary stop-loss, single indicator, no gates)
Fallback: If complex approach doesn’t outperform, simplify to best-performing baseline
Validation Trigger: If backtesting shows <10% improvement vs simple stop-loss, question complexity

Innovation Risk 2: False Positive Rate Too High

Risk: 2+ condition WARNING logic may still generate too many false exits (FER >30%)
Mitigation: Tunable thresholds via YAML config, conservative/aggressive presets available
Fallback: Increase WARNING requirement to 3+ conditions, or tighten individual condition thresholds
Validation Trigger: Track FER in Phase 4, adjust thresholds if >30%

Innovation Risk 3: 1-Hour Cadence Insufficient

Risk: Regime transitions may occur faster than 1-hour evaluation can detect (see RAIA R001)
Mitigation: Backtesting measures actual warning windows in historical data
Fallback: Implement 15-minute cadence or adaptive evaluation (NORMAL: 1h, WARNING: 15min, LATEST_ACCEPTABLE: 5min)
Validation Trigger: If >20% of transitions provide <2 hour warning, implement faster cadence

Innovation Risk 4: Sequential Gates Too Restrictive

Risk: Three-gate restart logic prevents timely re-entry, causing excessive opportunity cost
Mitigation: Track time-to-restart and profitability of missed ranging periods
Fallback: Parallel gate evaluation (all gates checked simultaneously) or reduce to 2 gates
Validation Trigger: If average time-to-restart >48 hours and missed profit >20% of preserved capital

Innovation Risk 5: Human-in-Loop Execution Delay

Risk: Manual execution introduces delay that negates early warning benefits
Mitigation: Measure Exit Reaction Time (ERT) - time from alert to actual exit
Fallback: If ERT consistently >30 minutes, consider API-based grid management (if KuCoin adds support) or multi-exchange architecture
Validation Trigger: Track ERT in operational phase, identify if manual execution is bottleneck

Innovation Risk 6: Audit Trail Overhead

Risk: Git commits for every decision create operational friction or repository bloat
Mitigation: Lightweight JSON/YAML files, daily aggregation, automated cleanup for old data
Fallback: Database logging with Git export for investor presentation
Validation Trigger: If Git operations slow evaluation >500ms or repo size >1GB, reconsider architecture

Backend Decision Support System - Specific Requirements

Project-Type Overview

This is a batch processing system with Git-based persistence, not a web API. The system runs as a Kubernetes CronJob executing Python modules directly with file-based output to a Git repository mounted on a Persistent Volume Claim (PVC).

Architecture:

KuCoin API → Python Evaluation → Git Commit (PVC) → Static Dashboard Generation → Git Push

Key Characteristics:

No HTTP API endpoints, no REST services, no client-server architecture
Scheduled Python execution (hourly via Kubernetes CronJob)
Git repository on PVC for persistence and retry capability
Static HTML dashboards with Chart.js visualizations generated every hour
Stateless job execution with all state loaded from/saved to Git

Data Pipeline

Processing Flow:

Data Acquisition: Fetch OHLCV from KuCoin API, load recent metrics from Git PVC
Regime Analysis: Calculate 6 metrics, classify regime, calculate confidence
Exit State Evaluation: Evaluate WARNING/LATEST_ACCEPTABLE_EXIT/MANDATORY_EXIT conditions
Gate Evaluation: If grid stopped, evaluate three sequential gates
State Transition Tracking: Log state changes with rate limiting
Decision Record Creation: Create immutable decision records
Dashboard Generation: Generate HTML/JavaScript dashboard with Chart.js
Data Persistence: Commit all files to Git (on PVC), push to remote with retry

No additional data transformation or aggregation stages for MVP - pipeline is complete as described.

Data Schemas: See SCHEMA.md for complete schema definitions (metrics, exit states, decision records, configuration).

Static Dashboard Generation

Execution: Dashboards generated as part of the same CronJob (not separate process)

Frequency: Every hour (regenerated with each evaluation)

Format: HTML with JavaScript charts (Chart.js library)

Structure: One dashboard HTML file per hour with embedded data for that evaluation period

File naming: dashboards/{symbol}/{YYYY-MM-DD}-{HH}.html
Self-contained: Data embedded in HTML (no external API calls)
Viewable via: file:// protocol locally, or simple HTTP server, or Git hosting

Visualizations (Essential - support recommendations/decisions):

Current regime classification and confidence
Exit state (NORMAL/WARNING/LATEST_ACCEPTABLE_EXIT/MANDATORY_EXIT)
All 6 metrics with current values and trends
Gate evaluation status (if grid stopped)
Recent state transition history
Decision recommendation (if actionable)

Technology Stack:

Chart.js for interactive visualizations
HTML5/CSS3 for layout
Embedded JSON data in <script> tags
No server-side rendering needed

Error Handling & Resilience

KuCoin API Failures

Strategy: Retry 2-3 times with exponential backoff, then skip cycle

Acceptable for MVP: Yes

Persistent Failure Alerting: Yes - if API fails for multiple consecutive cycles (threshold: 3+ consecutive failures), send alert notification

Monitoring: Track API response times, error rates, success/failure counts → send to Grafana Loki for observability

Git Push Failures

Strategy: Log locally on PVC, continue operation (acceptable gap in audit trail for MVP)

PVC Design: Yes - CronJob should use PVC so Git repo persists between runs and doesn’t require full clone each time

Retry on Subsequent Cycles: Yes - if push failed previously, retry push on next cycle before committing new data

Implementation:

# Pseudo-code
if has_unpushed_commits():
    try:
        git.push()
        logger.info("Pushed previously failed commits")
    except:
        logger.warning("Previous commits still not pushed")
 
# Continue with current evaluation

Metric Calculation Errors

Strategy: Evaluation should continue with remaining metrics if one fails

Critical vs Optional: All metrics are conceptually critical, BUT:

If a metric calculation fails (e.g., OU half-life non-stationary), continue with remaining metrics
If enough metrics succeed to calculate confidence, generate recommendation WITH additional error information
Include metric calculation errors in notification/dashboard

Implementation Approach:

Calculate all metrics with error handling per metric
Track which metrics succeeded vs failed
If confidence can be calculated (even with partial metrics), proceed
Include error context: “Recommendation based on 5/6 metrics (OU half-life calculation failed - data non-stationary)”

Error Notification: If confidence level is high enough for entry/exit recommendation, communicate the recommendation WITH error details about failed metrics

Configuration Validation Failures

Strategy: Fail fast on startup if config invalid

Acceptable: Yes - pod won’t start if config has errors

Blue-Green Deployment: Previous version should continue running if new version fails validation (Kubernetes deployment strategy)

Performance & Scalability

Processing Time Constraints

Maximum Acceptable Time: Not a concern functionally (even >1 hour would work), but:

Error threshold: If evaluation takes >5 minutes, log ERROR (potential performance issue)
Warning threshold: If evaluation takes >1 minute, log WARNING
Target: Complete evaluation in <30 seconds for typical case

No specific hard performance requirements - hourly cadence provides plenty of buffer

Git Repository Size Management

Retention Policy: Keep all historical data forever for MVP

Cleanup: No automated deletion for MVP

Action Item: Create action in RAIA log to revisit data retention policy at end of MVP (after validation phase complete)

Current Assessment: Not a concern - estimate ~10-50 KB per evaluation × 24 hours × 365 days ≈ 87-438 MB/year (manageable)

Configuration Management

Hot-Reload

Not required: Configuration changes take effect on next CronJob execution (no need for hot-reload)

Acceptable Delay: Up to 1 hour between config change and effect (next hourly run)

Versioning

Primary Versioning: Git commit hash of config file (tracked in decision records)

Image Versioning: Docker image version stored in image metadata (immutable)

Enhancement: Consider writing Docker image version to output files alongside config Git hash

Provides complete traceability: “This decision used config version X running on image version Y”
Useful for debugging if image code changes behavior

Implementation:

# In metrics files
system_version:
  config_git_hash: "a3f8d92e"
  image_version: "v1.2.3"  # From Docker image label

Monitoring & Observability

Metrics Collection: All metrics (Kubernetes pod metrics, application metrics, timing data) sent to Grafana Loki

Required Monitoring/Alerting:

CronJob Execution Failures: Job didn’t run at expected time
Evaluation Errors: Job ran but threw exceptions
KuCoin API Degradation: High failure rate (3+ consecutive failures)
Git Push Failures: Persistent issues (3+ consecutive push failures)
Metric Calculation Anomalies: Values out of expected ranges or calculation failures
Exit State Transitions: Log all WARNING/LATEST_ACCEPTABLE_EXIT/MANDATORY_EXIT transitions
Performance Degradation: Evaluation taking >5 minutes (ERROR) or >1 minute (WARNING)

Raw Metrics to Loki:

KuCoin API response times
KuCoin API error counts and types
CronJob execution duration
Internal processing step timings (metric calculation, Git operations, dashboard generation)
Errors and exceptions with full stack traces

Alert Channels:

Pushover (direct API) for critical alerts
Grafana for historical metrics and dashboards
Optional: Webhook to n8n for advanced routing (future enhancement)

Data Retention & Cleanup

Current Policy: No automated cleanup for MVP

All Data Retained:

Raw metrics: Forever
Decision records: Forever (audit trail requirement)
Exit state transitions: Forever
Dashboards: Forever

Future Review: Action item in RAIA log to revisit retention policy after MVP validation phase

Technology Stack

Core Stack:

Python 3.11+
Pydantic for schema validation
GitPython for Git operations
PyYAML for YAML parsing
Requests for KuCoin API calls
Chart.js for dashboard visualizations

Additional Dependencies:

Jinja2 (or similar) for HTML template rendering
JSON for embedded data in dashboards

Deployment:

Kubernetes CronJob
PVC for Git repository persistence
ConfigMap for exit_strategy_config.yaml (sourced from Git repo)
ExternalSecrets for KuCoin API keys (central secret store)

Deployment & Operations

Kubernetes Deployment

Current Status: Already working in Kubernetes

CronJob Configuration:

Schedule: 0 * * * * (hourly, on the hour)
PVC mount: Git repository persists between runs
No need to clone repo each time
Retry capability for failed Git pushes

PVC Design:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: market-maker-data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi  # Adjust based on growth

Configuration Sources

Configuration managed via:

Git Repository: exit_strategy_config.yaml committed to Git, loaded from PVC
ExternalSecrets: KuCoin API keys from central secret store (already implemented)
Environment Variables: Overrides for deployment-specific settings (data paths, logging levels)

No ConfigMap needed - configuration comes from Git repo on PVC

Configuration Flow:

Configuration YAML committed to Git (workspace-root)
Git repo cloned/updated on PVC by CronJob
Python reads config from PVC path
Config Git hash recorded in decision records

Implementation Considerations

File Organization:

repos/market-making/metrics-service/
├── src/
│   ├── regime/              # Regime detection (Phase 1 complete)
│   ├── exit_strategy/       # Exit strategy (Phase 2 target)
│   ├── schemas/             # Pydantic models (Phase 2)
│   ├── persistence/         # Git operations with retry (Phase 2)
│   ├── dashboards/          # Dashboard generation (Phase 5)
│   └── monitoring/          # Loki integration (Phase 5)
├── config/
│   └── exit_strategy_config.yaml
└── k8s/
    ├── cronjob.yaml
    ├── pvc.yaml
    └── external-secrets.yaml

Git Operations with PVC:

First run: Clone repository to PVC
Subsequent runs: git pull to update, commit new files, push with retry
Failed push: Accumulates commits on PVC, retries on next cycle
PVC ensures no data loss even if push fails

Dashboard Generation:

Generate HTML file per hour: dashboards/{symbol}/{YYYY-MM-DD}-{HH}.html
Embed evaluation data as JSON in <script> tag
Chart.js renders interactive charts client-side
Self-contained files (no external API calls)
Commit dashboards to Git for version control and distribution

Stateless Job Execution:

CronJob pod starts, mounts PVC with Git repo
Loads config and historical data from Git
Performs evaluation
Writes new files to Git (on PVC)
Commits and pushes
Generates dashboard
Pod exits
Next run starts fresh (but Git repo on PVC persists)

Project Scoping & Phased Development

MVP Strategy & Philosophy

MVP Approach: Validation-First Capital Protection System

This MVP follows a prove-before-scale philosophy. The system must demonstrate capital protection capability with £1K before committing £10K. The MVP is NOT “minimum features to launch” - it’s “minimum features to confidently scale capital.”

Why This Scope:

Can’t skip validation: Backtesting + live testing required before £10K deployment
Need visibility: Position risk quantification essential for informed exit decisions
Must measure success: KPI tracking proves system works (not just feels right)
Investor readiness: Complete audit trail + track record enables external capital (12-month vision)

Resource Requirements:

Development: Solo developer (Craig) with AI assistance
Capital: £1K validation → £10K scale → £100K+ external investment
Timeline: 2-4 weeks validation after Phases 2-5 complete
Infrastructure: Kubernetes cluster (already operational), KuCoin API access

What This MVP Proves:

Exit strategy preserves capital during regime transitions (75%+ profit retention)
System provides actionable warnings before catastrophic exits (95%+ stop-loss avoidance)
False positive rate acceptable (<30% - not stopping grids unnecessarily)
Human-in-loop execution viable (operator responds within acceptable windows)
Audit trail sufficient for investor scrutiny

MVP Feature Set (Phases 2-5)

Current State (Phase 1 - COMPLETE):

✅ Six regime metrics operational (ADX, Efficiency Ratio, Autocorrelation, OU Half-Life, Normalized Slope, Bollinger Bandwidth)
✅ Regime classification working (RANGE_OK, RANGE_WEAK, TRANSITION, TREND)
✅ Git-backed storage with Kubernetes CronJob (hourly evaluation)
✅ Basic Pushover notifications functional
⚠️ Data Quality Issue: Hardcoded dummy values in engine.py must be fixed before Phases 2-5 (see implementation-plan.md Phase 1)

Core User Journeys Supported:

Journey 1: Active Grid Trader (Exit Protection)

Real-time exit state evaluation (WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
Push notifications with actionable recommendations
Position risk visibility (capital at risk, profit give-back estimates)
Manual exit execution with state tracking

Journey 2: Historical Decision Reviewer (Self-Validation)

Git-backed immutable decision records
KPI analysis framework (SLAR, PRR, TTDR, FER metrics)
Backtesting framework showing system would have worked
Track record for personal scaling decision

Journey 5: Kubernetes CronJob (Scheduled Evaluation)

Stateless hourly execution with PVC-backed Git persistence
Retry logic for Git push failures
Direct Pushover API integration (no n8n dependency)
Static HTML dashboard generation with Chart.js

Must-Have Capabilities:

Phase 2: Exit Strategy Core

Exit State Machine: Progressive urgency states (NORMAL → WARNING → LATEST_ACCEPTABLE_EXIT → MANDATORY_EXIT)
Three-Gate Restart Logic: Sequential validation (Directional Energy Decay → Mean Reversion Return → Tradable Volatility)
Multi-Condition Triggering: Require 2+ warning conditions to prevent false alarms
State Transition Tracking: Git-logged transitions with timestamps and reasoning
Historical Data Loading: Load last 12-24 hours of metrics for persistence checks

Trigger Logic Implemented:

MANDATORY_EXIT: TREND regime detected, 2+ consecutive closes outside range, directional structure confirmed
LATEST_ACCEPTABLE_EXIT: TRANSITION persists (≥2×4h OR ≥4×1h bars), OU half-life ≥2× baseline, volatility expansion >1.25×
WARNING: 2+ conditions met (TRANSITION probability ≥40%, confidence declining, efficiency ratio rising, mean reversion slowing, volatility expanding)

Phase 3: Position Risk Quantification

KuCoin Position Tracking: Fetch real-time position data via API
Capital Risk Calculator: Quantify capital at risk, profit give-back estimates, stop-loss distance in ATR
Enhanced Notifications: All exit state alerts include position risk context
Graceful Degradation: System continues if KuCoin API unavailable (uses last known positions)

Notification Enhancements:

WARNING: “Capital at risk: $120.50, Review within 24h”
LATEST_ACCEPTABLE_EXIT: “Expected give-back if delayed 12h: $4-7, Exit within 4-12h”
MANDATORY_EXIT: “Stop-loss distance: 0.6 ATR (CRITICAL), Exit NOW”

Phase 4: Testing & Validation

Unit Tests: 60+ tests covering metric calculations, exit triggers, state transitions
Integration Tests: End-to-end flow (regime → exit state → notification → Git commit)
Backtesting Framework: Replay historical metrics (3-6 months data), validate exit quality
CI/CD Pipeline: GitHub Actions with quality gates (80%+ coverage, all tests pass)

Backtesting Success Criteria:

Profit Retention Ratio ≥75% (preserved majority of range profits)
Stop-Loss Avoidance Rate ≥95% (exited before stop-loss in 95%+ scenarios)
False Exit Rate ≤30% (acceptable false positive rate)
Average warning lead time ≥30 minutes (met timing requirements)

Phase 5: Operational Foundation

Evaluation Cadence: 1-hour CronJob execution (0 * * * *) - matches 12-24h warning window assumption
Audit Logging: Complete Git-backed state transitions, notification delivery tracking, operator action recording
KPI Tracking Framework: Calculate SLAR, PRR, TTDR, FER, MEC metrics from audit logs
Static Dashboards: HTML/Chart.js visualizations generated hourly, committed to Git
Monitoring Integration: All metrics/logs sent to Grafana Loki for observability

Dashboard Visualizations:

Current regime classification and confidence score
Exit state (NORMAL/WARNING/LATEST_ACCEPTABLE_EXIT/MANDATORY_EXIT)
All 6 metrics with current values and trends
Gate evaluation status (if grid stopped)
Recent state transition history

Out of Scope for MVP

Explicitly NOT Included:

Multi-Symbol Support:

MVP: Single ETH-USDT grid only (SINGLE_GRID mode)
Rationale: Prove exit strategy works for one symbol before scaling
Future: Multi-symbol portfolio management (post-MVP growth feature)

Automated Grid Creation:

MVP: Human approval required for all grid starts
Rationale: Preserve human judgment, reduce regulatory complexity
Future: Automated creation with high-confidence thresholds (post-MVP)

Advanced Dashboards:

MVP: Static HTML/Chart.js files generated hourly
Rationale: Sufficient for validation phase, investor-presentable
Future: Real-time interactive dashboards, performance attribution analysis (post-MVP)

15-Minute Evaluation Cadence:

MVP: 1-hour evaluation cycle
Rationale: Research indicates 12-24h warning windows (RAIA A001, A004) - hourly sufficient
Future: Adaptive cadence (state-based frequency) if validation shows need (post-MVP)

Automated Cleanup:

MVP: Keep all data forever (no retention policy)
Rationale: Preserve complete audit trail for validation analysis
Future: Revisit after MVP complete (RAIA action item)

Multi-Exchange Support:

MVP: KuCoin only
Rationale: Single exchange simplifies integration, acceptable for validation
Future: Multi-exchange diversification reduces outage risk (post-MVP)

Performance Optimization:

MVP: Functional performance (evaluation <5 minutes acceptable)
Rationale: 1-hour cadence provides plenty of buffer
Future: Caching, async processing if needed (post-MVP)

Post-MVP Roadmap

Phase 6: Capital Scaling (3-Month Horizon)

Objective: Operate at £10K capital with proven exit strategy

Prerequisites:

MVP validation complete (2-4 weeks live operation with £1K)
Capital doubled to £2K during validation
Zero stop-loss breaches during validation period
KPIs meet targets (SLAR ≥95%, PRR ≥75%, TTDR ≥70%)

Enhancements:

Track record documentation for personal scaling decision
Threshold tuning based on real performance data
KPI trend analysis (monthly reports)

Timeline: Month 4-6 after MVP complete

Phase 7: Investor Preparation (6-Month Horizon)

Objective: Package track record for external capital raise (£100K+)

Prerequisites:

3+ months operation at £10K capital
Consistent monthly capital growth (4%+ average)
Clean failure analysis documentation
Backtesting validated against 3+ years historical data

Deliverables:

Investor Presentation: Track record visualization, backtesting evidence, failure analysis
Separation of Concerns: “System recommendation quality” vs “Operator action quality” metrics
Regulatory Review: Legal assessment before external capital (RAIA A006)
Multi-Symbol Validation: Expand beyond ETH-USDT, prove approach generalizes

Timeline: Month 7-12 after MVP complete

Phase 8: Growth Features (12-Month+ Horizon)

Objective: Scale operations with enhanced automation and intelligence

Enhanced Automation:

Automated grid creation with high-confidence thresholds (human override available)
Multi-symbol portfolio management (concurrent grids across symbols)
Dynamic capital allocation based on regime confidence

Analytics & Reporting:

Real-time visual dashboards (replace static HTML)
Automated investor reports (monthly performance summaries)
Performance attribution analysis (which decisions drove returns)
Regime classification accuracy tracking (learn from misclassifications)

Intelligence Enhancements:

Machine learning for regime classification refinement (adaptive to market structure changes)
Adaptive gate thresholds based on market conditions (not static YAML config)
Predictive exit timing optimization (earlier warnings for faster regime transitions)

Risk Management Expansion:

Portfolio-level risk limits (not just per-grid)
Correlation analysis across symbols (avoid concentrated exposure)
Multi-exchange support (KuCoin + Binance + others for outage protection)

Timeline: Month 13+ after MVP complete

Progressive Feature Roadmap Summary

MVP (Phases 2-5): Capital Protection Foundation

Exit strategy + validation + operational foundation
£1K validation → confident £10K scale
2-4 weeks live operation
Done When: KPIs proven, audit trail complete, zero stop-loss breaches

Phase 6 (Post-MVP): Capital Scaling

Operate at £10K with proven system
3 months track record building
Done When: Consistent 4%+ monthly growth, ready for investor presentation

Phase 7 (6-Month): Investor Readiness

Multi-symbol validation
External capital preparation (£100K+)
Done When: Investor presentation complete, regulatory review done

Phase 8 (12-Month+): Growth & Intelligence

Enhanced automation (within asymmetric philosophy)
ML-based refinements
Multi-exchange portfolio management
Done When: Operating at £100K+ scale with external investment

Risk Mitigation Strategy

Technical Risks:

Innovation Risk 1: 1-Hour Cadence Insufficient

Risk: Regime transitions may occur faster than hourly evaluation can detect (RAIA R001)
Mitigation: Backtesting validates actual warning windows in historical data (Phase 4)
Fallback: Implement 15-minute cadence if >20% of transitions provide <2h warning
Validation Trigger: Monitor during Phases 2-5, measure warning lead times in KPI framework

Innovation Risk 2: False Exit Rate Too High

Risk: 2+ condition WARNING logic may still generate excessive false exits (FER >30%)
Mitigation: Tunable thresholds via YAML config, conservative/aggressive presets
Fallback: Increase WARNING requirement to 3+ conditions, or tighten individual thresholds
Validation Trigger: Track FER in Phase 4 backtesting, adjust before live deployment

Innovation Risk 3: Sequential Gates Too Restrictive

Risk: Three-gate restart logic prevents timely re-entry, excessive opportunity cost
Mitigation: Track time-to-restart and profitability of missed ranging periods (KPI framework)
Fallback: Parallel gate evaluation or reduce to 2 gates
Validation Trigger: If average time-to-restart >48h AND missed profit >20% of preserved capital

Market Risks:

Fast Regime Transitions (RAIA R001)

Risk: Market moves faster than 1-hour cycle can detect, insufficient warning time
Mitigation:
- Backtesting validates 12-24h warning window assumption (RAIA Action 1)
- Monitor near-miss scenarios during validation
- Prepared to implement 15-minute cadence if needed
Trigger: If >20% of transitions provide <2h warning window

Exchange Outage During Critical Exit (RAIA R002)

Risk: Cannot execute manual exit when KuCoin unavailable during MANDATORY_EXIT
Mitigation: Accept as known limitation (manual execution dependency)
Future: Multi-exchange diversification (Phase 8)
Monitoring: Track incidents during validation (RAIA Action 2)

Capital Loss from False Positives (RAIA R004)

Risk: Excessive false exits erode capital through missed ranging periods
Mitigation:
- Three-gate restart logic prevents premature re-entry
- Backtesting validates FER <30% (RAIA A005, Action 3)
- KPI tracking measures false exit impact
Trigger: If FER >30% in backtesting, tighten WARNING thresholds

Resource Risks:

Data Quality Issues Block Progress

Risk: Phase 1 hardcoded dummy values must be fixed before Phases 2-5 trustworthy
Mitigation: Phase 1 prioritized, 40-60 hours estimated (see implementation-plan.md)
Status: In progress (ADX complete, 11% of Phase 1 done)
Contingency: Allocate 20% buffer time for unexpected data issues

Testing Reveals Major Bugs

Risk: Phase 4 backtesting shows exit logic fundamentally flawed
Mitigation: Test early (consider Phase 4 before Phases 2-3), iterate on thresholds
Fallback: Simplify trigger logic (remove complex conditions), use proven baselines
Contingency: Budget 50% additional time if major redesign needed

Scope Validation & Constraints

What Makes This the Right MVP:

Can Validate Core Value Proposition:

Exit strategy proven to preserve capital (backtesting + live testing)
Tiered urgency model tested (WARNING → LATEST_ACCEPTABLE → MANDATORY progression)
Sequential gates validated (prevents premature re-entry)
Human-in-loop execution proven viable (operator can respond in time)

Can Make Confident Scaling Decision:

KPI framework provides objective success measures (SLAR, PRR, TTDR)
Audit trail shows “did system work?” vs “did I follow advice?”
Backtesting + 2-4 weeks live operation = sufficient confidence for £10K
Track record foundation for future investor presentation

Can Be Completed in Reasonable Timeframe:

Phase 1: 2-3 weeks (data quality fix)
Phases 2-5: 4-6 weeks (exit strategy + validation + operational)
Total: 6-9 weeks development + 2-4 weeks validation = 2-3 months to “MVP Done”

Boundaries Tested:

✅ Could validate without Phase 3 (Position Risk)? NO

Need “capital at risk: $120” visibility for informed exit decisions
Essential for £10K scale confidence
Position risk quantification is must-have

✅ Could validate with basic text dashboards (no Chart.js)? NO

User explicitly requires charts for regime trend assessment
Visual confirmation of exit state transitions aids decision-making
Chart.js is lightweight, not over-engineering

✅ Could validate without backtesting (Phase 4)? NO

Can’t trust exit logic without historical validation
Need objective proof of 75% profit retention, 95% stop-loss avoidance
De-risks £10K capital deployment
Backtesting is must-have

✅ Could simplify Phase 5 (Operational)? YES - Potential optimization

Could defer fancy KPI dashboards (manual calculation acceptable)
1-hour cadence already correct (not 15-min)
Simple YAML audit logs sufficient initially (enhance later)
Simplification Opportunity: Streamline Phase 5 to basic logging + manual KPIs

Phase Sequence Validation:

Current Plan: Phase 1 → 2 → 3 → 4 → 5 (sequential)

Alternative Considered: Phase 1 → 4 → 2 → 3 → 5 (backtest-first)

Benefit: Validate exit logic via backtesting BEFORE building Phases 2-3
Risk: Delays getting operational system, harder to iterate without working code
Decision: Keep current sequence (2→3→4) for faster feedback loop, but Phase 4 can start in parallel with Phase 3

Recommended Optimization:

Phase 1: Data Quality (BLOCKER - must complete first)
Phase 2 + Phase 4 (partial): Build exit strategy WHILE creating backtesting framework
Phase 3: Position Risk (can parallelize with Phase 4 backtesting)
Phase 4 (complete): Validate everything before deployment
Phase 5: Operational polish

Success Criteria (MVP “Done”)

Completion Criteria (All Must Be Met):

✅ Code Complete:

All Phase 2-5 code implemented with 100% test pass rate
No critical bugs, no hardcoded dummy values
Configuration complete and validated

✅ Backtesting Validation (Phase 4):

Exit logic tested against 3-6 months historical data
Profit Retention Ratio ≥75%
Stop-Loss Avoidance Rate ≥95%
False Exit Rate ≤30%
Average warning lead time ≥30 minutes

✅ Live Capital Validation (2-4 Weeks):

Operated with £1K live capital for 2-4 weeks
Experienced multiple regime cycles (at least 2-3 TRANSITION events)
Zero stop-loss breaches during validation period (excluding black swan events)
KPIs meet targets in live operation (not just backtesting)

✅ Capital Scaling Milestone:

Capital doubled from £1K to £2K during validation period
Proves system protects capital WHILE capturing ranging profits
Demonstrates profitability, not just capital preservation

✅ Audit Trail Complete:

All decision records committed to Git with timestamps
State transitions logged with reasoning and metrics
Can answer “why didn’t you exit here?” for any historical moment
Separation of system recommendations vs operator actions tracked

✅ System Ready for £10K:

Risk calculations scale correctly (position sizing, stop-loss placement)
Position tracking handles larger capital amounts
Notification system tested and reliable
Operator confident in decision-making process

MVP Declared “Done” When: All six completion criteria met + personal decision: “I’m ready to deploy £10K confidently.”

3-Month Success (Post Phase 2-5):

Operating at £10K capital with same exit quality metrics
Consistent monthly growth (4%+ average)
Clean track record of exit decisions with measurable outcomes
Investor presentation materials ready (if pursuing external capital)

12-Month Vision:

£100K+ capital with external investment
Exit strategy proven across multiple market regimes (bull, bear, ranging, volatile)
Published track record of regime classification accuracy
Multi-symbol support (beyond single ETH-USDT grid)

Functional Requirements

Regime Analysis & Classification

FR1: System can fetch OHLCV market data from exchange API
FR2: System can calculate six regime metrics (ADX, Efficiency Ratio, Autocorrelation, OU Half-Life, Normalized Slope, Bollinger Bandwidth)
FR3: System can classify market regime into four states (RANGE_OK, RANGE_WEAK, TRANSITION, TREND)
FR4: System can calculate regime confidence score
FR5: System can persist regime analysis results to version-controlled storage
FR6: System can load historical regime analysis for trend evaluation

Exit Strategy Management

FR7: System can evaluate current exit state based on regime analysis (NORMAL, WARNING, LATEST_ACCEPTABLE_EXIT, MANDATORY_EXIT)
FR8: System can detect MANDATORY_EXIT conditions (TREND regime, consecutive closes outside range, directional structure confirmed)
FR9: System can detect LATEST_ACCEPTABLE_EXIT conditions (TRANSITION persistence, mean reversion degradation, volatility expansion)
FR10: System can detect WARNING conditions requiring 2+ triggering metrics
FR11: System can track exit state transitions with timestamps and reasons
FR12: System can evaluate three sequential restart gates (Directional Energy Decay, Mean Reversion Return, Tradable Volatility)
FR13: System can enforce gate sequencing (Gate N+1 only evaluated if Gate N passes)
FR14: System can track gate status history for stopped grids
FR15: System can determine grid eligibility for restart based on gate progression

Risk Assessment

FR16: System can fetch active position data from exchange API
FR17: System can calculate unrealized PnL for active positions
FR18: System can calculate capital at risk based on current positions and stop-loss distance
FR19: System can estimate profit give-back if exit delayed by specified hours
FR20: System can calculate stop-loss distance in ATR units
FR21: System can track grid position health relative to configured boundaries
FR22: System can gracefully degrade when position data unavailable (use last known state)

Notification & Alerting

FR23: Operator can receive exit state notifications via push notification service
FR24: System can rate-limit notifications based on exit state urgency (WARNING: 4h, LATEST_ACCEPTABLE: 2h, MANDATORY: 1h)
FR25: System can include position risk context in notifications (capital at risk, profit give-back, stop-loss distance)
FR26: System can include regime metrics in notifications (confidence, verdict, triggering conditions)
FR27: System can track notification delivery status (sent, delivered, failed)
FR28: System can prevent duplicate notifications for unchanged exit states

Audit & Decision Tracking

FR29: System can create immutable decision records with timestamps, regime state, and exit recommendations
FR30: System can commit decision records to version-controlled storage before sending notifications
FR31: System can track configuration version (Git hash) used for each decision
FR32: System can track system image version used for each decision
FR33: Operator can query historical decision records by date range, symbol, or exit state
FR34: System can track operator actions (grid stopped, grid started, exit declined) separately from system recommendations
FR35: System can maintain separation between “system recommendation quality” and “operator action quality”
FR36: System can provide complete audit trail for investor scrutiny

Validation & Analysis

FR37: System can replay historical metrics for backtesting exit strategy
FR38: System can calculate Profit Retention Ratio (PRR) from historical data
FR39: System can calculate Stop-Loss Avoidance Rate (SLAR) from historical data
FR40: System can calculate True Transition Detection Rate (TTDR) from historical data
FR41: System can calculate False Exit Rate (FER) from historical data
FR42: System can calculate Exit Reaction Time (ERT) when operator action data available
FR43: System can generate KPI reports for specified time periods
FR44: System can identify false positive exits (regime returned to RANGE after exit)
FR45: System can identify false negative exits (regime transitioned but no exit signal)
FR46: Operator can compare backtesting results against live operation results

System Operations

FR47: System can execute regime evaluation on scheduled intervals (hourly)
FR48: System can validate configuration schema on startup
FR49: System can retry failed Git push operations on subsequent evaluation cycles
FR50: System can generate static HTML dashboards with embedded visualizations
FR51: System can track evaluation execution time and log performance warnings
FR52: System can send operational metrics to logging infrastructure (Grafana Loki)
FR53: System can handle partial metric calculation failures (continue with available metrics)
FR54: System can log metric calculation errors in decision records
FR55: Operator can override configuration via environment variables
FR56: System can detect and alert on persistent API failures (3+ consecutive failures)

Non-Functional Requirements

Performance

NFR-P1: Regime evaluation completes within 5 minutes (allows 55-minute buffer before next hourly cycle)
NFR-P2: Evaluation time exceeding 1 minute triggers WARNING log entry
NFR-P3: Evaluation time exceeding 5 minutes triggers ERROR log entry and operator alert
NFR-P4: Notification delivery latency <60 seconds from decision record creation
NFR-P5: Git commit and push operations complete within 10 seconds under normal conditions
NFR-P6: Historical data loading (12-24 hours of metrics) completes within 30 seconds

Reliability

NFR-R1: System availability ≥99% during validation phase (acceptable: ~7 hours downtime per month)
NFR-R2: CronJob execution success rate ≥98% (missed evaluations acceptable if isolated)
NFR-R3: Failed Git push operations retry automatically on subsequent evaluation cycles
NFR-R4: Failed KuCoin API calls retry up to 3 times with exponential backoff before declaring failure
NFR-R5: System continues exit state evaluation when position data unavailable (graceful degradation)
NFR-R6: Persistent failures (3+ consecutive cycles) trigger operator alerts
NFR-R7: Configuration errors detected on startup prevent deployment (fail-fast with rollback to previous version)

Security

NFR-S1: Exchange API keys stored in external secrets management (not in code/config files)
NFR-S2: API keys restricted with IP whitelist and no-withdrawal permissions
NFR-S3: Decision records stored in private Git repository with access limited to operator
NFR-S4: All API communications use HTTPS/TLS encryption
NFR-S5: Pushover notifications encrypted in transit
NFR-S6: No credentials or API keys logged in application logs or decision records
NFR-S7: Git repository authentication uses SSH keys (not HTTPS credentials)

Integration

NFR-I1: System tolerates KuCoin API response times up to 5 seconds (retry if exceeded)
NFR-I2: KuCoin API rate limits are not exceeded (hourly cadence well within limits)
NFR-I3: Pushover API failures do not block evaluation completion
NFR-I4: Git push failures do not prevent decision record creation (stored locally, pushed later)
NFR-I5: System handles KuCoin API maintenance windows gracefully (uses last known data, alerts operator)
NFR-I6: Failed notification delivery tracked and retried on next evaluation cycle
NFR-I7: API integration errors include actionable context (error type, retry count, next action)

Data Integrity

NFR-D1: Decision records are immutable once committed to Git (no retroactive editing)
NFR-D2: All decision records include Git commit hash for configuration version traceability
NFR-D3: All decision records include Docker image version for code traceability
NFR-D4: Metric calculation errors logged in decision records (transparent failure tracking)
NFR-D5: Partial metric failures documented with specific metrics unavailable
NFR-D6: Timestamp precision to the second for all decision records and state transitions
NFR-D7: Operator actions tracked separately from system recommendations (no conflation)
NFR-D8: Historical data retained indefinitely for MVP (no automated deletion)

Techcle Wiki

Explorer

prd

Product Requirements Document - Grid Exit Strategy - Phases 2-5

Success Criteria

User Success (You as Grid Trader)

Business Success (Capital Scaling & Investor Readiness)

Technical Success

Measurable Outcomes

Product Scope

MVP - Minimum Viable Product (Phases 2-5)

Post-MVP Growth Path

User Journeys

Journey 1: Craig - Active Grid Trader (Exit Protection)

Journey 2: Craig - Historical Decision Reviewer (Investor Preparation)

Journey 3: External Investor - Track Record Evaluation

Journey 4: System Administrator - Deployment & Monitoring

Journey 5: Kubernetes CronJob - Scheduled Evaluation

Journey Requirements Summary

Domain-Specific Requirements

Compliance & Regulatory

Security Architecture

Technical Constraints

Resilience & Failure Handling

Integration Requirements

Risk Mitigations

Crypto Trading Domain Specifics

Assumptions & Actions

Innovation & Novel Patterns

Detected Innovation Areas

Market Context & Competitive Landscape

Validation Approach

Risk Mitigation

Backend Decision Support System - Specific Requirements

Project-Type Overview

Data Pipeline

Static Dashboard Generation

Error Handling & Resilience

KuCoin API Failures

Git Push Failures

Metric Calculation Errors

Configuration Validation Failures

Performance & Scalability

Processing Time Constraints

Git Repository Size Management

Configuration Management

Hot-Reload

Versioning

Monitoring & Observability

Data Retention & Cleanup

Technology Stack

Deployment & Operations

Kubernetes Deployment

Configuration Sources

Implementation Considerations

Project Scoping & Phased Development

MVP Strategy & Philosophy

MVP Feature Set (Phases 2-5)

Phase 2: Exit Strategy Core

Phase 3: Position Risk Quantification

Phase 4: Testing & Validation

Phase 5: Operational Foundation

Out of Scope for MVP

Post-MVP Roadmap

Phase 6: Capital Scaling (3-Month Horizon)

Phase 7: Investor Preparation (6-Month Horizon)

Phase 8: Growth Features (12-Month+ Horizon)

Progressive Feature Roadmap Summary

Risk Mitigation Strategy

Scope Validation & Constraints

Success Criteria (MVP “Done”)

Functional Requirements

Regime Analysis & Classification

Exit Strategy Management

Risk Assessment

Notification & Alerting

Audit & Decision Tracking

Validation & Analysis

System Operations

Non-Functional Requirements