Design: Contextual Vulnerability Recommendation Engine

Issue: CAS-85 Status: Draft Author: CTO

Problem Statement

CascadeGuard secure images will always carry some vulnerabilities. A blanket CVE exemption is too blunt — it ignores the fact that the same CVE poses vastly different risk levels depending on who is running the image and how they deploy it. We need a system that asks the right questions about a customer’s business context and workload deployment, then cross-references those answers against the image’s actual vulnerability data to produce personalised, actionable recommendations rather than a binary “fix or exempt” decision.

Design Overview

Core Concept: Risk Profiles Recommendation Engine

+-----------------+   +------------------+   +-------------------------+
| Company Profile | + | Workload Profile | -> | Risk-Weighted Vulns     |
| (10 questions)  |   | (10 questions)   |   | + Recommendations       |
+-----------------+   +------------------+   +-------------------------+
        |                      |                          |
   Risk factors           Exposure factors         Per-CVE actions:
   (jurisdiction,         (network, env,           - Must Fix (SLA)
    compliance,            data sensitivity,        - Recommended Fix
    industry)              runtime config)          - Acceptable Risk
                                                    - Mitigatable

Company Profile Questionnaire (~10 questions)

These questions establish the regulatory and organisational risk context:

#QuestionOptionsRisk Signal
1Primary jurisdiction?UK, EU, US, APAC, OtherDetermines applicable regulations
2Industry sector?Financial services, Healthcare, Government, Technology, Retail, OtherSector-specific compliance
3Subject to specific regulations?PCI-DSS, HIPAA, SOC2, FedRAMP, DORA, NIS2, ISO 27001, NoneHard compliance requirements
4Company size?Startup (<50), SMB (50-500), Enterprise (500+)Risk tolerance and audit exposure
5Do you handle PII/PHI?Yes at scale, Yes limited, NoData protection obligations
6Do you process payment card data?Yes directly, Yes via processor, NoPCI scope
7Subject to external security audits?Annually, Quarterly, Ad-hoc, NoneCompliance verification frequency
8Supply chain security requirements?SBOM required, Signed images required, Both, NoneProvenance needs
9Incident response SLA obligations?< 24h, < 72h, < 7d, NoneBreach notification windows
10Risk appetite for known vulnerabilities?Zero tolerance, Low (critical/high only), Moderate, Accept with mitigationOverall posture

Workload Profile Questionnaire (~10 questions)

A customer may use the same image in multiple workloads (e.g. the same Node.js base image running an internet-facing API server and an internal batch job). Each deployment context gets its own workload profile, and the recommendation engine generates a separate recommendation set per (company profile, workload profile, image) combination. This means the same CVE can receive different actions for different deployments of the same image within the same organisation.

These questions establish the deployment and exposure context for a specific workload using a given image:

#QuestionOptionsRisk Signal
1Network exposure?Internet-facing / DMZ, Internal network only, Air-gappedAttack surface
2Environment type?Production, Staging, Development, CI/CD onlyBlast radius
3Data classification?Public, Internal, Confidential, Restricted/SecretData sensitivity
4Authentication to this workload?Public / anonymous, Authenticated users, Service-to-service onlyAccess control
5Container runtime privileges?Privileged / host network, Standard, Restricted (read-only root, no caps)Exploitability
6Runs as root?Yes, No, UnknownPrivilege escalation risk
7Persistent storage with sensitive data?Yes, NoData exfiltration risk
8Accepts untrusted input?Yes user uploads/forms, Yes API input, NoInjection surface
9Outbound network access?Unrestricted, Restricted egress, No egressC2/exfil potential
10Update frequency tolerance?Continuous (GitOps), Weekly maintenance window, Monthly, QuarterlyRemediation cadence

Recommendation Engine Logic

Step 1: Compute Risk Score Modifiers

Each answer maps to risk factor weights that modify the base severity of vulnerabilities:

  • Company modifiers handle jurisdiction multipliers and regulatory flags
  • Workload modifiers handle severity bumps based on network exposure, environment type, data sensitivity, and runtime configuration

Step 2: CVE Classification

For each vulnerability, combine:

  • Base severity (from scanner)
  • CVSS vector components (network vs local, user interaction, etc.)
  • Package context (runtime vs build-only dependency)
  • Fix availability (fixed_version present or not)

Fix Availability and Regulatory Treatment

Fix availability is a first-class factor in the recommendation engine. When no upstream fix exists, the recommendation shifts from “patch it” to “mitigate or accept with documentation”:

Fix StatusRegulatory TreatmentEngine Behaviour
Fix availableAll frameworks expect timely remediation (PCI-DSS: 30 days critical, 90 days high; FedRAMP: 30/90/180 by severity; DORA: “without undue delay”)Must Fix or Recommended Fix with SLA based on profile
Fix pending (upstream aware, no release yet)Frameworks generally accept documented compensating controls while awaiting vendor fix. PCI-DSS 6.2 and ISO 27001 A.12.6.1 both recognise that remediation depends on vendor timeline.Mitigatable — recommend runtime controls (network segmentation, WAF rules, restricted capabilities) with a review trigger when the fix ships
No fix / won’t fixRegulators accept risk acceptance decisions when formally documented with justification and compensating controls. FedRAMP POA&M process, PCI-DSS compensating controls worksheet, and DORA’s risk assessment all provide mechanisms for this.Acceptable Risk or Mitigatable depending on exploitability and exposure, with mandatory documented rationale
Not applicable (build-only dep, unreachable code path)Not typically required to remediate, but must be documented if flagged during auditAcceptable Risk with rationale noting non-runtime context

The engine records the fix status at recommendation generation time so that when an upstream fix later becomes available, re-running the recommendation against a new scan will automatically escalate previously-mitigated items.

Step 3: Per-CVE Recommendation

RecommendationMeaningAction
Must FixRegulatory/risk profile demands remediation within SLARebuild with fix or replace package
Recommended FixBest practice but not compliance-blockingSchedule in next maintenance window
Acceptable RiskContext shows low actual riskDocument acceptance, review periodically
MitigatableRuntime controls can reduce risk without patchingApply network policy, seccomp, read-only FS

Step 4: Summary Report

  • Risk posture summary (overall rating for image + profile)
  • CVE breakdown by recommendation tier
  • Top 3-5 priority actions
  • Mitigation suggestions (network policies, seccomp, capabilities)
  • Compliance notes (which regulations require which fixes)
  • SLA comparison (why Acceptable Risk items are acceptable in context)

Data Model Changes

New Tables (Additive — no existing table changes)

company_profiles

ColumnTypeNotes
idTEXT PKUUID
nameTEXTProfile display name
answersTEXT (JSON)Questionnaire answers
risk_flagsTEXT (JSON)Derived risk flags
created_atTEXTISO timestamp
updated_atTEXTISO timestamp

workload_profiles

ColumnTypeNotes
idTEXT PKUUID
nameTEXTProfile display name
answersTEXT (JSON)Questionnaire answers
risk_scoreREALComputed composite score
created_atTEXTISO timestamp
updated_atTEXTISO timestamp

recommendations

ColumnTypeNotes
idTEXT PKUUID
image_idTEXT FKReferences images.id
scan_idTEXT FKReferences scans.id
company_profile_idTEXT FKReferences company_profiles.id
workload_profile_idTEXT FKReferences workload_profiles.id
summaryTEXT (JSON)Aggregated stats and posture
generated_atTEXTISO timestamp

recommendation_items

ColumnTypeNotes
idTEXT PKUUID
recommendation_idTEXT FKReferences recommendations.id
vulnerability_idTEXT FKReferences vulnerabilities.id
original_severityTEXTFrom scanner
adjusted_severityTEXTAfter profile weighting
actionTEXTmust_fix / recommended_fix / acceptable_risk / mitigatable
rationaleTEXTHuman-readable explanation
mitigationsTEXT (JSON)Suggested runtime mitigations
compliance_notesTEXT (JSON)Regulatory references

API Design

Authorization and Access Control

All recommendation endpoints sit behind the existing CascadeGuard auth layer. Access control follows the principle that profiles and recommendations are tenant-scoped — a user can only see and manage data belonging to their own organisation.

ResourceCreateReadUpdateDeleteNotes
Company profilesAdminAll org membersAdminAdminOrg-wide settings; non-admins consume but don’t modify
Workload profilesAll org membersAll org membersCreator + AdminCreator + AdminAny team member can define a deployment context
RecommendationsAll org membersAll org membersN/A (immutable)AdminGenerated as point-in-time snapshots; no edits, only regenerate
Questionnaire definitionsN/A (system)Public (unauthenticated)N/AN/AQuestion schemas are read-only reference data

Key security constraints:

  • All profile and recommendation endpoints require a valid session/API key and enforce tenant isolation at the query layer (no cross-org data leakage)
  • Recommendation generation is rate-limited to prevent abuse (e.g. 10 generations per image per hour)
  • Questionnaire definition endpoints are public to support unauthenticated preview flows (e.g. marketing “see what we check” pages) — they contain no customer data
  • PDF export of recommendations inherits the same access controls as the recommendation itself

Profile Management

  • POST /api/profiles/company — create company profile
  • GET /api/profiles/company — list company profiles
  • POST /api/profiles/workload — create workload profile
  • GET /api/profiles/workload — list workload profiles
  • GET /api/questionnaires/company — question definitions + options
  • GET /api/questionnaires/workload — question definitions + options

Recommendation Generation

  • POST /api/images/:id/recommendations — generate for given profiles
  • GET /api/images/:id/recommendations — list recommendation sets
  • GET /api/images/:id/recommendations/:recId — full recommendation + items

Frontend Changes

  1. Profile Wizard (/profiles/new) — step-by-step questionnaire UI
  2. Recommendation View (/dashboard/:imageId/recommendations/:recId) — summary + filterable CVE table with rationale
  3. Image Detail Enhancement — “Get Recommendations” CTA + previous recommendations sidebar

Implementation Phases

PhaseScopeEstimated Stories
1Data model + questionnaire API1
2Recommendation engine logic1-2
3Frontend profile wizard1
4Frontend recommendation dashboard1
5Compliance packs + PDF export1-2

Key Design Decisions

  1. Contextualise, do not exempt — the same CVE can be “Must Fix” for one customer and “Acceptable Risk” for another
  2. Questions versioned in code — easy to update, test, and version; answers stored as JSON for forward-compat
  3. Point-in-time snapshots — recommendations tied to specific scans; regenerate on new scans
  4. Existing SLA untouched — recommendations are an advisory layer, SLA deadlines remain independent
  5. Additive schema — all new tables, zero risk to existing functionality