Backend Decision Support System - Project-Type Requirements
Project: Grid Exit Strategy - Phases 2-5
Project Type: Backend Decision Support API (Batch Processing)
Date: 2026-02-01
Instructions
Please answer the questions below by filling in your responses directly in this file.
Once complete, let me know and I’ll incorporate your answers into the PRD.
1. Data Storage Schema
Status: ✅ COMPLETE - Documented in SCHEMA.md
All data schemas defined (metrics files, exit state transitions, decision records, configuration).
2. Data Pipeline Architecture
2.1 Processing Flow
Current Understanding:
KuCoin API → Python Evaluation → Git Commit
Questions:
Q2.1.1: Is this the complete pipeline, or are there additional stages?
YOUR ANSWER: no
Q2.1.2: Are there any data transformation or aggregation steps beyond what’s in the pipeline?
YOUR ANSWER: not atm
2.2 Static Dashboard Generation
Q2.2.1: Will dashboards be generated as part of the same CronJob, or as a separate process?
YOUR ANSWER: same
Q2.2.2: How often should dashboards regenerate? (Every evaluation, daily, on-demand?)
YOUR ANSWER: every hour
Q2.2.3: What format for dashboards? (HTML with JavaScript charts, static images, both?)
YOUR ANSWER: html with js
Q2.2.4: What visualizations are essential for MVP vs nice-to-have?
Essential:
- whatevers needed to support the recommednations.
Nice-to-have:
3. Error Handling & Failure Modes
3.1 KuCoin API Failures
Current Strategy: Retry 2-3 times with exponential backoff, then skip cycle
Q3.1.1: Is this acceptable, or do you need different behavior?
YOUR ANSWER: fine for now
Q3.1.2: Should persistent API failures (multiple cycles) trigger alerts?
YOUR ANSWER: yes
3.2 Git Push Failures
Current Strategy: Log locally, continue operation (acceptable gap in audit trail)
Q3.2.1: Is this acceptable for MVP?
YOUR ANSWER: yes, should the cron job have a pvc so it doesnt clone every time and can push data later if a git issue occurs?
Q3.2.2: Should the system retry Git pushes on subsequent cycles?
YOUR ANSWER: if possible
3.3 Metric Calculation Errors
Q3.3.1: If one metric calculation fails (e.g., OU half-life non-stationary), should evaluation continue with remaining metrics or abort?
YOUR ANSWER: evaluation should continue, consideration needs to be given to how this results in an error notification… (consider tbat if the calculations that do run, give a confidence level high enough for entry exit that should be communicated, woth additional info about the error)
Q3.3.2: Which metrics are critical vs optional? (If ADX fails, can we still classify regime?)
Critical metrics:
- sll afaik, caveat above
Optional metrics:
3.4 Configuration Validation Failures
Current Strategy: Fail fast on startup if config invalid
Q3.4.1: Is this acceptable? (Pod won’t start if config has errors)
YOUR ANSWER:
yes, previous version should continue to run
4. Performance & Scalability
4.1 Processing Time Constraints
Q4.1.1: What’s the maximum acceptable evaluation time? (Must complete before next hourly cycle)
YOUR ANSWER: i dont thunk this is likely to be an issue… but even more than an hour would be fine functionally. i think we should probably error if its more than 5mins.
Q4.1.2: Are there any specific performance requirements? (e.g., “must process in <30 seconds”)
YOUR ANSWER: no
4.2 Git Repository Size Management
Q4.2.1: How long should raw data be retained? (All historical data forever, or cleanup after N months?)
YOUR ANSWER: forever for the moment , create an action to revisit at the end of mvp
Q4.2.2: Should old data be archived/compressed, or just deleted?
YOUR ANSWER: no deletion, review at a later date
Q4.2.3: Any concerns about Git repo size growth? (Rough estimate: X KB per evaluation × 24 hours × 365 days)
YOUR ANSWER: not atm
5. Configuration Management
5.1 Configuration Hot-Reload
Q5.1.1: Should configuration changes take effect immediately (hot-reload), or require pod restart?
YOUR ANSWER: no need, nextcron job is fine
5.2 Configuration Versioning
Current Strategy: Config version hash tracked in decision records
Q5.2.1: Is Git commit hash sufficient for config versioning?
YOUR ANSWER: yes
Q5.2.2: Do you need semantic versioning (v1.0, v1.1, etc.) or is Git hash enough?
YOUR ANSWER: on the image we track version. could be worth storing that in the image (immutable) and then write it in output files?
6. Monitoring & Observability
Q6.1: What monitoring/alerting do you need for Phase 2-5?
Check all that apply:
- CronJob execution failures (job didn’t run)
- Evaluation errors (job ran but threw exceptions)
- KuCoin API degradation (high failure rate)
- Git push failures (persistent issues)
- Metric calculation anomalies (values out of expected ranges)
- Exit state transitions (log all WARNING/LATEST_ACCEPTABLE/MANDATORY)
- Performance degradation (evaluation taking too long)
- Other: ______________________
YOUR ANSWER: expect raw numbers (kucoin response times, errors, cron job execution times, errors, internal steps, etc) to go intolgtm
all of the above seem sensible allerts
7. Data Retention & Cleanup
Q7.1: Should there be automated cleanup of old data?
YOUR ANSWER: not atm
Q7.2: If yes, what’s the retention policy?
Example:
- Raw metrics: Keep 90 days
- Decision records: Keep forever (audit trail)
- Exit state transitions: Keep forever
- Daily aggregations: Keep 1 year
YOUR ANSWER:
8. Static Dashboard Requirements (Detailed)
Q8.1: What are the must-have visualizations for Phase 5?
Rank in priority order (1 = highest priority):
- Exit state timeline chart (showing NORMAL → WARNING → LATEST_ACCEPTABLE → MANDATORY over time)
- Metric trends (ADX, OU half-life, etc. over days/weeks)
- Decision history table (all exit decisions with outcomes)
- KPI dashboard (SLAR, PRR, TTDR, FER current values)
- Gate evaluation status (which gates passing/failing over time)
- Confidence score trend
- Other: ______________________
YOUR ANSWER: whatevers needed to suppor the recommendations/decisions
Q8.2: Interactive (JavaScript) or static (pre-rendered images)?
YOUR ANSWER: js
Q8.3: Single page dashboard or multiple views?
YOUR ANSWER: i expected a dashboard per hour in the first instancd with the data for that period embedded. this might change over time (or if this is a terrible idea?)
9. Batch Processing Specifics
9.1 Stateless Execution
Current Design: Each CronJob execution is completely independent (no state carried between runs)
Q9.1.1: Is stateless execution acceptable, or do you need persistent in-memory state?
YOUR ANSWER: i think the git repois likely a pvc
9.2 Historical Data Caching
Current Design: Cache last 24 hours of metrics in memory during evaluation
Q9.2.1: Is in-memory caching sufficient, or do you need a more sophisticated caching layer?
YOUR ANSWER:
i dont think last 24 hours, i thunk the python will read back files as it needs for each period
10. Technology Stack Validation
Current Stack:
- Python 3.11+
- Pydantic for schema validation
- GitPython for Git operations
- PyYAML for YAML parsing
- Requests for KuCoin API
- (TBD: Chart library for dashboards - matplotlib? plotly? Chart.js?)
Q10.1: Any changes or additions to the technology stack?
YOUR ANSWER:
fine, chart.js
Q10.2: Preference for dashboard charting library?
Options:
- matplotlib (static PNG images)
- plotly (interactive HTML)
- Chart.js (interactive JavaScript)
- Other: ______________________
YOUR ANSWER:
chart.js fine
11. Deployment & Operations
Q11.1: Any specific deployment considerations for Phase 2-5?
YOUR ANSWER: no, this is working un k8s already
Q11.2: How should configuration be managed in Kubernetes?
Current options:
- ConfigMap for exit_strategy_config.yaml
- Secrets for KuCoin API keys (already done)
- Environment variable overrides
YOUR ANSWER: comes from git repo & secrets (central secret store).
Summary of Answered Questions
Once you’ve answered the questions above, please indicate completion:
- All questions answered
- Ready to incorporate into PRD
Additional notes or clarifications:
Next Steps:
Once you’ve completed this file, I’ll:
- Review your answers
- Generate the project-type specific requirements section
- Append to PRD
- Continue to Step 8 (Scoping)