Backend Decision Support System - Project-Type Requirements

Project: Grid Exit Strategy - Phases 2-5
Project Type: Backend Decision Support API (Batch Processing)
Date: 2026-02-01


Instructions

Please answer the questions below by filling in your responses directly in this file.

Once complete, let me know and I’ll incorporate your answers into the PRD.


1. Data Storage Schema

Status: ✅ COMPLETE - Documented in SCHEMA.md

All data schemas defined (metrics files, exit state transitions, decision records, configuration).


2. Data Pipeline Architecture

2.1 Processing Flow

Current Understanding:

KuCoin API → Python Evaluation → Git Commit

Questions:

Q2.1.1: Is this the complete pipeline, or are there additional stages?

YOUR ANSWER: no

Q2.1.2: Are there any data transformation or aggregation steps beyond what’s in the pipeline?

YOUR ANSWER: not atm


2.2 Static Dashboard Generation

Q2.2.1: Will dashboards be generated as part of the same CronJob, or as a separate process?

YOUR ANSWER: same

Q2.2.2: How often should dashboards regenerate? (Every evaluation, daily, on-demand?)

YOUR ANSWER: every hour

Q2.2.3: What format for dashboards? (HTML with JavaScript charts, static images, both?)

YOUR ANSWER: html with js

Q2.2.4: What visualizations are essential for MVP vs nice-to-have?

Essential:

  • whatevers needed to support the recommednations.

Nice-to-have:


3. Error Handling & Failure Modes

3.1 KuCoin API Failures

Current Strategy: Retry 2-3 times with exponential backoff, then skip cycle

Q3.1.1: Is this acceptable, or do you need different behavior?

YOUR ANSWER: fine for now

Q3.1.2: Should persistent API failures (multiple cycles) trigger alerts?

YOUR ANSWER: yes


3.2 Git Push Failures

Current Strategy: Log locally, continue operation (acceptable gap in audit trail)

Q3.2.1: Is this acceptable for MVP?

YOUR ANSWER: yes, should the cron job have a pvc so it doesnt clone every time and can push data later if a git issue occurs?

Q3.2.2: Should the system retry Git pushes on subsequent cycles?

YOUR ANSWER: if possible


3.3 Metric Calculation Errors

Q3.3.1: If one metric calculation fails (e.g., OU half-life non-stationary), should evaluation continue with remaining metrics or abort?

YOUR ANSWER: evaluation should continue, consideration needs to be given to how this results in an error notification… (consider tbat if the calculations that do run, give a confidence level high enough for entry exit that should be communicated, woth additional info about the error)

Q3.3.2: Which metrics are critical vs optional? (If ADX fails, can we still classify regime?)

Critical metrics:

  • sll afaik, caveat above

Optional metrics:


3.4 Configuration Validation Failures

Current Strategy: Fail fast on startup if config invalid

Q3.4.1: Is this acceptable? (Pod won’t start if config has errors)

YOUR ANSWER:

yes, previous version should continue to run


4. Performance & Scalability

4.1 Processing Time Constraints

Q4.1.1: What’s the maximum acceptable evaluation time? (Must complete before next hourly cycle)

YOUR ANSWER: i dont thunk this is likely to be an issue… but even more than an hour would be fine functionally. i think we should probably error if its more than 5mins.

Q4.1.2: Are there any specific performance requirements? (e.g., “must process in <30 seconds”)

YOUR ANSWER: no


4.2 Git Repository Size Management

Q4.2.1: How long should raw data be retained? (All historical data forever, or cleanup after N months?)

YOUR ANSWER: forever for the moment , create an action to revisit at the end of mvp

Q4.2.2: Should old data be archived/compressed, or just deleted?

YOUR ANSWER: no deletion, review at a later date

Q4.2.3: Any concerns about Git repo size growth? (Rough estimate: X KB per evaluation × 24 hours × 365 days)

YOUR ANSWER: not atm


5. Configuration Management

5.1 Configuration Hot-Reload

Q5.1.1: Should configuration changes take effect immediately (hot-reload), or require pod restart?

YOUR ANSWER: no need, nextcron job is fine


5.2 Configuration Versioning

Current Strategy: Config version hash tracked in decision records

Q5.2.1: Is Git commit hash sufficient for config versioning?

YOUR ANSWER: yes

Q5.2.2: Do you need semantic versioning (v1.0, v1.1, etc.) or is Git hash enough?

YOUR ANSWER: on the image we track version. could be worth storing that in the image (immutable) and then write it in output files?


6. Monitoring & Observability

Q6.1: What monitoring/alerting do you need for Phase 2-5?

Check all that apply:

  • CronJob execution failures (job didn’t run)
  • Evaluation errors (job ran but threw exceptions)
  • KuCoin API degradation (high failure rate)
  • Git push failures (persistent issues)
  • Metric calculation anomalies (values out of expected ranges)
  • Exit state transitions (log all WARNING/LATEST_ACCEPTABLE/MANDATORY)
  • Performance degradation (evaluation taking too long)
  • Other: ______________________

YOUR ANSWER: expect raw numbers (kucoin response times, errors, cron job execution times, errors, internal steps, etc) to go intolgtm

all of the above seem sensible allerts


7. Data Retention & Cleanup

Q7.1: Should there be automated cleanup of old data?

YOUR ANSWER: not atm

Q7.2: If yes, what’s the retention policy?

Example:

  • Raw metrics: Keep 90 days
  • Decision records: Keep forever (audit trail)
  • Exit state transitions: Keep forever
  • Daily aggregations: Keep 1 year

YOUR ANSWER:


8. Static Dashboard Requirements (Detailed)

Q8.1: What are the must-have visualizations for Phase 5?

Rank in priority order (1 = highest priority):

  • Exit state timeline chart (showing NORMAL → WARNING → LATEST_ACCEPTABLE → MANDATORY over time)
  • Metric trends (ADX, OU half-life, etc. over days/weeks)
  • Decision history table (all exit decisions with outcomes)
  • KPI dashboard (SLAR, PRR, TTDR, FER current values)
  • Gate evaluation status (which gates passing/failing over time)
  • Confidence score trend
  • Other: ______________________

YOUR ANSWER: whatevers needed to suppor the recommendations/decisions

Q8.2: Interactive (JavaScript) or static (pre-rendered images)?

YOUR ANSWER: js

Q8.3: Single page dashboard or multiple views?

YOUR ANSWER: i expected a dashboard per hour in the first instancd with the data for that period embedded. this might change over time (or if this is a terrible idea?)


9. Batch Processing Specifics

9.1 Stateless Execution

Current Design: Each CronJob execution is completely independent (no state carried between runs)

Q9.1.1: Is stateless execution acceptable, or do you need persistent in-memory state?

YOUR ANSWER: i think the git repois likely a pvc


9.2 Historical Data Caching

Current Design: Cache last 24 hours of metrics in memory during evaluation

Q9.2.1: Is in-memory caching sufficient, or do you need a more sophisticated caching layer?

YOUR ANSWER:

i dont think last 24 hours, i thunk the python will read back files as it needs for each period


10. Technology Stack Validation

Current Stack:

  • Python 3.11+
  • Pydantic for schema validation
  • GitPython for Git operations
  • PyYAML for YAML parsing
  • Requests for KuCoin API
  • (TBD: Chart library for dashboards - matplotlib? plotly? Chart.js?)

Q10.1: Any changes or additions to the technology stack?

YOUR ANSWER:

fine, chart.js

Q10.2: Preference for dashboard charting library?

Options:

  • matplotlib (static PNG images)
  • plotly (interactive HTML)
  • Chart.js (interactive JavaScript)
  • Other: ______________________

YOUR ANSWER:

chart.js fine


11. Deployment & Operations

Q11.1: Any specific deployment considerations for Phase 2-5?

YOUR ANSWER: no, this is working un k8s already

Q11.2: How should configuration be managed in Kubernetes?

Current options:

  • ConfigMap for exit_strategy_config.yaml
  • Secrets for KuCoin API keys (already done)
  • Environment variable overrides

YOUR ANSWER: comes from git repo & secrets (central secret store).


Summary of Answered Questions

Once you’ve answered the questions above, please indicate completion:

  • All questions answered
  • Ready to incorporate into PRD

Additional notes or clarifications:


Next Steps:

Once you’ve completed this file, I’ll:

  1. Review your answers
  2. Generate the project-type specific requirements section
  3. Append to PRD
  4. Continue to Step 8 (Scoping)