Design: Contextual Vulnerability Recommendation Engine
Issue: CAS-85 Status: Draft Author: CTO
Problem Statement
CascadeGuard secure images will always carry some vulnerabilities. A blanket CVE exemption is too blunt — it ignores the fact that the same CVE poses vastly different risk levels depending on who is running the image and how they deploy it. We need a system that asks the right questions about a customer’s business context and workload deployment, then cross-references those answers against the image’s actual vulnerability data to produce personalised, actionable recommendations rather than a binary “fix or exempt” decision.
Design Overview
Core Concept: Risk Profiles → Recommendation Engine
+-----------------+ +------------------+ +-------------------------+
| Company Profile | + | Workload Profile | -> | Risk-Weighted Vulns |
| (10 questions) | | (10 questions) | | + Recommendations |
+-----------------+ +------------------+ +-------------------------+
| | |
Risk factors Exposure factors Per-CVE actions:
(jurisdiction, (network, env, - Must Fix (SLA)
compliance, data sensitivity, - Recommended Fix
industry) runtime config) - Acceptable Risk
- Mitigatable
Company Profile Questionnaire (~10 questions)
These questions establish the regulatory and organisational risk context:
| # | Question | Options | Risk Signal |
|---|---|---|---|
| 1 | Primary jurisdiction? | UK, EU, US, APAC, Other | Determines applicable regulations |
| 2 | Industry sector? | Financial services, Healthcare, Government, Technology, Retail, Other | Sector-specific compliance |
| 3 | Subject to specific regulations? | PCI-DSS, HIPAA, SOC2, FedRAMP, DORA, NIS2, ISO 27001, None | Hard compliance requirements |
| 4 | Company size? | Startup (<50), SMB (50-500), Enterprise (500+) | Risk tolerance and audit exposure |
| 5 | Do you handle PII/PHI? | Yes at scale, Yes limited, No | Data protection obligations |
| 6 | Do you process payment card data? | Yes directly, Yes via processor, No | PCI scope |
| 7 | Subject to external security audits? | Annually, Quarterly, Ad-hoc, None | Compliance verification frequency |
| 8 | Supply chain security requirements? | SBOM required, Signed images required, Both, None | Provenance needs |
| 9 | Incident response SLA obligations? | < 24h, < 72h, < 7d, None | Breach notification windows |
| 10 | Risk appetite for known vulnerabilities? | Zero tolerance, Low (critical/high only), Moderate, Accept with mitigation | Overall posture |
Workload Profile Questionnaire (~10 questions)
A customer may use the same image in multiple workloads (e.g. the same Node.js base image running an internet-facing API server and an internal batch job). Each deployment context gets its own workload profile, and the recommendation engine generates a separate recommendation set per (company profile, workload profile, image) combination. This means the same CVE can receive different actions for different deployments of the same image within the same organisation.
These questions establish the deployment and exposure context for a specific workload using a given image:
| # | Question | Options | Risk Signal |
|---|---|---|---|
| 1 | Network exposure? | Internet-facing / DMZ, Internal network only, Air-gapped | Attack surface |
| 2 | Environment type? | Production, Staging, Development, CI/CD only | Blast radius |
| 3 | Data classification? | Public, Internal, Confidential, Restricted/Secret | Data sensitivity |
| 4 | Authentication to this workload? | Public / anonymous, Authenticated users, Service-to-service only | Access control |
| 5 | Container runtime privileges? | Privileged / host network, Standard, Restricted (read-only root, no caps) | Exploitability |
| 6 | Runs as root? | Yes, No, Unknown | Privilege escalation risk |
| 7 | Persistent storage with sensitive data? | Yes, No | Data exfiltration risk |
| 8 | Accepts untrusted input? | Yes user uploads/forms, Yes API input, No | Injection surface |
| 9 | Outbound network access? | Unrestricted, Restricted egress, No egress | C2/exfil potential |
| 10 | Update frequency tolerance? | Continuous (GitOps), Weekly maintenance window, Monthly, Quarterly | Remediation cadence |
Recommendation Engine Logic
Step 1: Compute Risk Score Modifiers
Each answer maps to risk factor weights that modify the base severity of vulnerabilities:
- Company modifiers handle jurisdiction multipliers and regulatory flags
- Workload modifiers handle severity bumps based on network exposure, environment type, data sensitivity, and runtime configuration
Step 2: CVE Classification
For each vulnerability, combine:
- Base severity (from scanner)
- CVSS vector components (network vs local, user interaction, etc.)
- Package context (runtime vs build-only dependency)
- Fix availability (fixed_version present or not)
Fix Availability and Regulatory Treatment
Fix availability is a first-class factor in the recommendation engine. When no upstream fix exists, the recommendation shifts from “patch it” to “mitigate or accept with documentation”:
| Fix Status | Regulatory Treatment | Engine Behaviour |
|---|---|---|
| Fix available | All frameworks expect timely remediation (PCI-DSS: 30 days critical, 90 days high; FedRAMP: 30/90/180 by severity; DORA: “without undue delay”) | Must Fix or Recommended Fix with SLA based on profile |
| Fix pending (upstream aware, no release yet) | Frameworks generally accept documented compensating controls while awaiting vendor fix. PCI-DSS 6.2 and ISO 27001 A.12.6.1 both recognise that remediation depends on vendor timeline. | Mitigatable — recommend runtime controls (network segmentation, WAF rules, restricted capabilities) with a review trigger when the fix ships |
| No fix / won’t fix | Regulators accept risk acceptance decisions when formally documented with justification and compensating controls. FedRAMP POA&M process, PCI-DSS compensating controls worksheet, and DORA’s risk assessment all provide mechanisms for this. | Acceptable Risk or Mitigatable depending on exploitability and exposure, with mandatory documented rationale |
| Not applicable (build-only dep, unreachable code path) | Not typically required to remediate, but must be documented if flagged during audit | Acceptable Risk with rationale noting non-runtime context |
The engine records the fix status at recommendation generation time so that when an upstream fix later becomes available, re-running the recommendation against a new scan will automatically escalate previously-mitigated items.
Step 3: Per-CVE Recommendation
| Recommendation | Meaning | Action |
|---|---|---|
| Must Fix | Regulatory/risk profile demands remediation within SLA | Rebuild with fix or replace package |
| Recommended Fix | Best practice but not compliance-blocking | Schedule in next maintenance window |
| Acceptable Risk | Context shows low actual risk | Document acceptance, review periodically |
| Mitigatable | Runtime controls can reduce risk without patching | Apply network policy, seccomp, read-only FS |
Step 4: Summary Report
- Risk posture summary (overall rating for image + profile)
- CVE breakdown by recommendation tier
- Top 3-5 priority actions
- Mitigation suggestions (network policies, seccomp, capabilities)
- Compliance notes (which regulations require which fixes)
- SLA comparison (why Acceptable Risk items are acceptable in context)
Data Model Changes
New Tables (Additive — no existing table changes)
company_profiles
| Column | Type | Notes |
|---|---|---|
| id | TEXT PK | UUID |
| name | TEXT | Profile display name |
| answers | TEXT (JSON) | Questionnaire answers |
| risk_flags | TEXT (JSON) | Derived risk flags |
| created_at | TEXT | ISO timestamp |
| updated_at | TEXT | ISO timestamp |
workload_profiles
| Column | Type | Notes |
|---|---|---|
| id | TEXT PK | UUID |
| name | TEXT | Profile display name |
| answers | TEXT (JSON) | Questionnaire answers |
| risk_score | REAL | Computed composite score |
| created_at | TEXT | ISO timestamp |
| updated_at | TEXT | ISO timestamp |
recommendations
| Column | Type | Notes |
|---|---|---|
| id | TEXT PK | UUID |
| image_id | TEXT FK | References images.id |
| scan_id | TEXT FK | References scans.id |
| company_profile_id | TEXT FK | References company_profiles.id |
| workload_profile_id | TEXT FK | References workload_profiles.id |
| summary | TEXT (JSON) | Aggregated stats and posture |
| generated_at | TEXT | ISO timestamp |
recommendation_items
| Column | Type | Notes |
|---|---|---|
| id | TEXT PK | UUID |
| recommendation_id | TEXT FK | References recommendations.id |
| vulnerability_id | TEXT FK | References vulnerabilities.id |
| original_severity | TEXT | From scanner |
| adjusted_severity | TEXT | After profile weighting |
| action | TEXT | must_fix / recommended_fix / acceptable_risk / mitigatable |
| rationale | TEXT | Human-readable explanation |
| mitigations | TEXT (JSON) | Suggested runtime mitigations |
| compliance_notes | TEXT (JSON) | Regulatory references |
API Design
Authorization and Access Control
All recommendation endpoints sit behind the existing CascadeGuard auth layer. Access control follows the principle that profiles and recommendations are tenant-scoped — a user can only see and manage data belonging to their own organisation.
| Resource | Create | Read | Update | Delete | Notes |
|---|---|---|---|---|---|
| Company profiles | Admin | All org members | Admin | Admin | Org-wide settings; non-admins consume but don’t modify |
| Workload profiles | All org members | All org members | Creator + Admin | Creator + Admin | Any team member can define a deployment context |
| Recommendations | All org members | All org members | N/A (immutable) | Admin | Generated as point-in-time snapshots; no edits, only regenerate |
| Questionnaire definitions | N/A (system) | Public (unauthenticated) | N/A | N/A | Question schemas are read-only reference data |
Key security constraints:
- All profile and recommendation endpoints require a valid session/API key and enforce tenant isolation at the query layer (no cross-org data leakage)
- Recommendation generation is rate-limited to prevent abuse (e.g. 10 generations per image per hour)
- Questionnaire definition endpoints are public to support unauthenticated preview flows (e.g. marketing “see what we check” pages) — they contain no customer data
- PDF export of recommendations inherits the same access controls as the recommendation itself
Profile Management
POST /api/profiles/company— create company profileGET /api/profiles/company— list company profilesPOST /api/profiles/workload— create workload profileGET /api/profiles/workload— list workload profilesGET /api/questionnaires/company— question definitions + optionsGET /api/questionnaires/workload— question definitions + options
Recommendation Generation
POST /api/images/:id/recommendations— generate for given profilesGET /api/images/:id/recommendations— list recommendation setsGET /api/images/:id/recommendations/:recId— full recommendation + items
Frontend Changes
- Profile Wizard (
/profiles/new) — step-by-step questionnaire UI - Recommendation View (
/dashboard/:imageId/recommendations/:recId) — summary + filterable CVE table with rationale - Image Detail Enhancement — “Get Recommendations” CTA + previous recommendations sidebar
Implementation Phases
| Phase | Scope | Estimated Stories |
|---|---|---|
| 1 | Data model + questionnaire API | 1 |
| 2 | Recommendation engine logic | 1-2 |
| 3 | Frontend profile wizard | 1 |
| 4 | Frontend recommendation dashboard | 1 |
| 5 | Compliance packs + PDF export | 1-2 |
Key Design Decisions
- Contextualise, do not exempt — the same CVE can be “Must Fix” for one customer and “Acceptable Risk” for another
- Questions versioned in code — easy to update, test, and version; answers stored as JSON for forward-compat
- Point-in-time snapshots — recommendations tied to specific scans; regenerate on new scans
- Existing SLA untouched — recommendations are an advisory layer, SLA deadlines remain independent
- Additive schema — all new tables, zero risk to existing functionality