PRD: “Test Us” Flow — One-Off Prospect Scanning

Status: Draft Author: CTO Agent Date: 2026-04-02 Issue: CAS-25


Problem

Prospects evaluating CascadeGuard have no way to see the product work on their images before committing to sign-up and CI integration. The existing onboarding requires GitHub SSO, API key setup, and installing reusable workflows — too much friction for a “just show me” moment.

We need a lightweight “Test Us” flow that lets a prospect submit a container image artifact, see a real scan report, and understand the value CascadeGuard provides — all within minutes.

Two-tier model: The simplest input methods (Dockerfile paste and lockfile upload) work without any sign-up at all — zero friction, instant results. This gives the prospect an immediate “wow” moment. More advanced methods (zip upload, GitHub repo, git push) and full features (PDF export, saved history, drip campaign) require sign-up with marketing consent.

Goals

  1. Zero-friction first touch — prospect gets scan results from a Dockerfile paste without any sign-up at all
  2. Demonstrate real value — show actual vulnerabilities, SBOM, and remediation guidance on their own image
  3. Protect our infrastructure — all prospect-supplied artifacts are treated as untrusted and processed in full isolation
  4. Drive conversion — scan results naturally lead to sign-up and “monitor this image continuously” CTA
  5. Build marketing pipeline — signed-up users enter a drip campaign; scan results (stored as YAML) power personalized follow-up emails
  6. Arm the champion — PDF export (sign-up required) serves as a stakeholder-ready sales deck for why the prospect’s org should buy CascadeGuard

Non-Goals

  • Replacing the full CI-integrated scanning pipeline (this is a one-shot preview)
  • Supporting private registry pulls (prospect must supply the artifact directly)
  • Persistent image monitoring (results expire after 7 days)
  • Paid feature gating (this is a free top-of-funnel tool)
  • Choosing a marketing platform (handled as a separate strategic decision)

Input Methods

1. Dockerfile Paste (+ optional lockfiles) — NO SIGN-UP REQUIRED

Flow: Prospect pastes Dockerfile text into a web editor, and optionally uploads zero or more lockfiles alongside. No account needed — this is the zero-friction entry point.

  • Assumes no build context (no COPY/ADD from local filesystem will resolve)
  • We parse the Dockerfile to extract base image references
  • We scan the referenced base images (pull from public registries only)
  • We analyze the Dockerfile itself for best-practice violations (running as root, missing healthcheck, pinning issues, etc.)
  • If lockfiles are uploaded, we scan them for known vulnerabilities (same analysis as the zip flow)

Supported lockfiles: package-lock.json, yarn.lock, pnpm-lock.yaml, requirements.txt, Pipfile.lock, poetry.lock, go.sum, Gemfile.lock, Cargo.lock, composer.lock, pom.xml, gradle.lockfile, and similar dependency manifests.

Constraints:

  • Max 100KB Dockerfile size
  • Max 10 lockfiles, each up to 5MB
  • Only public base images are pullable
  • COPY/ADD instructions are flagged but cannot be resolved (no build context)

Anonymous scan limits:

  • Cloudflare Turnstile CAPTCHA required (prevents bot abuse without requiring sign-up)
  • Rate limited by IP: 3 scans per IP per 24 hours
  • Results shown inline on the page but NOT saved — no persistent report URL, no PDF export
  • Clear CTA after results: “Sign up to save this report, export PDF, and get continuous monitoring”
  • If the prospect signs up after scanning, the current scan results are retroactively saved to their account

2. Zip Upload — SIGN-UP REQUIRED

Flow: Prospect uploads a zip archive containing a Dockerfile and optionally build context.

  • We extract and inspect the archive in a quarantined environment
  • We do NOT build the image — we analyze the Dockerfile and any lockfiles/manifests found in the archive
  • Lockfiles (package-lock.json, requirements.txt, go.sum, Gemfile.lock, etc.) are scanned for known vulnerabilities
  • Base images referenced in the Dockerfile are pulled and scanned

Constraints:

  • Max 50MB upload size
  • Archive must contain a Dockerfile at root or in a clearly named subdirectory
  • No executable content is run from the archive — static analysis only
  • Archive is virus-scanned before extraction

Flow: Prospect authorizes temporary read access to a repository via GitHub OAuth.

  • Prospect clicks “Connect GitHub” and selects a repository
  • We use a GitHub App installation with fine-grained, repository-scoped permissions (contents: read only)
  • We clone the repository into a quarantined environment
  • We locate Dockerfiles and lockfiles, perform the same analysis as the zip flow
  • Access is immediately revoked after cloning completes (uninstall the app installation or revoke the token)
  • Total access window: < 60 seconds

Implementation:

  • GitHub App with repository:contents:read permission, configured for user-initiated install
  • On callback: clone repo, revoke access token, process offline
  • Alternatively: use a short-lived fine-grained personal access token via OAuth device flow

Constraints:

  • Public repositories only for unauthenticated flow; private repos require GitHub OAuth
  • Access token TTL: 60 seconds max, revoked immediately after clone
  • Repository size limit: 500MB
  • We clone only the default branch HEAD (shallow clone, depth=1)

4. Git Push — SIGN-UP REQUIRED

Flow: Prospect pushes to a temporary Git remote we provide.

  • On the “Test Us” page (post sign-up), we generate a unique temporary remote URL: https://scan.cascadeguard.com/incoming/{uuid}.git
  • The UUID in the URL acts as a short-lived shared key — it authenticates the push without requiring separate credentials
  • Prospect runs git remote add cascadeguard <url> && git push cascadeguard HEAD
  • We receive the push, extract the repository content, and process it
  • The remote is destroyed after processing (or after 15 minutes TTL, whichever comes first)

Implementation:

  • Lightweight Git HTTP backend (smart protocol) running as a Cloudflare Worker + R2 for pack storage
  • Push triggers a queue message to kick off analysis
  • No additional authentication required — the UUID in the URL is the credential (cryptographically random, 128-bit, time-limited)

Constraints:

  • Max push size: 100MB
  • Only one push per URL (subsequent pushes rejected)
  • Remote URL expires after 15 minutes regardless of use (short-lived to minimize exposure)
  • UUID is cryptographically random (128-bit)
  • Tied to the authenticated user’s session — cannot be reused by another account

Quarantined Processing Environment

All prospect-supplied input is untrusted. We must protect CascadeGuard infrastructure from:

  • Malicious archive contents (zip bombs, symlink attacks, path traversal)
  • Malicious Dockerfiles (attempts to exfiltrate via FROM with attacker-controlled registries)
  • Oversized inputs designed to exhaust resources
  • Supply chain attacks embedded in lockfiles or manifests

Architecture

Why Cloudflare Workers (not Fly.io): We keep everything on the Cloudflare stack to minimize operational complexity, reduce vendor count, and stay within our existing billing/security posture. Workers can handle Dockerfile parsing, lockfile analysis, and coordinating vulnerability DB lookups. For the heavier OCI image pull + scan workloads, we use Cloudflare Workers with R2 streaming and a pre-compiled WASM scanner, or defer to a lightweight container-based Worker (Workers for Platforms / Container Workers — currently in beta) if pure Workers can’t meet memory requirements. If container scanning proves impossible on Workers alone, we fall back to a $5/month Hetzner VPS with gVisor isolation as the minimal external compute.

Prospect Browser
      │
      ▼
┌──────────────────────┐     ┌──────────────────────────┐
│  CF Workers            │────▶│  R2 (Quarantine Bucket)   │
│  (API + Auth + Scan)   │     │  - Uploaded artifacts      │
│                        │     │  - Isolated per scan-id    │
│  - Sign-up / auth      │     └──────────┬─────────────────┘
│  - Dockerfile parse    │                 │
│  - Lockfile analysis   │                 │
│  - Vuln DB lookup      │     ┌───────────▼────────────────┐
│  - SBOM generation     │     │  R2 (Results Bucket)        │
│  - Report render       │     │  - Scan results (YAML+JSON) │
│  - PDF generation      │     │  - Pre-rendered HTML report  │
└──────────┬─────────────┘     │  - PDF exports               │
           │                    │  - SBOM artifacts            │
           ▼                    └───────────────────────────────┘
┌──────────────────────┐
│  CF Queue             │
│  (scan-jobs)          │
│  - Dead letter after  │
│    3 retries          │
└───────────────────────┘

Fallback for heavy OCI scanning (if CF Workers can’t handle it):

  • Single Hetzner VPS (~$5/month) with gVisor sandbox per scan
  • Workers dispatch scan jobs to VPS via authenticated webhook
  • VPS pulls OCI images, runs Grype/Trivy/Syft, POSTs results back to Workers
  • No persistent state on VPS — ephemeral per scan

Isolation Requirements

LayerControlPurpose
Upload validationSize limits, content-type checking, magic-byte validationReject obviously malicious inputs early
Archive extractionCustom extractor with: path traversal prevention, symlink resolution blocking, file count limits (max 10,000 files), decompression bomb detection (max 10:1 ratio)Prevent zip bomb and path traversal attacks
Compute isolationEach scan runs in an isolated Workers invocation (V8 isolate) or gVisor sandbox on fallback VPSProcess-level isolation; no shared filesystem or network namespace between scans
Network isolationEgress allowlist: Docker Hub, GHCR, Quay.io registries only. No access to CascadeGuard internal servicesPrevent exfiltration and SSRF
Resource limits2 vCPU, 2GB RAM, 10 min timeout per scanPrevent resource exhaustion
Artifact lifecycleAll scan artifacts deleted after results are extracted. R2 quarantine objects TTL: 1 hourMinimize attack surface window
Registry pull safetyOnly pull FROM images matching a public registry allowlist. Block FROM directives pointing to private/unknown registriesPrevent information leakage via controlled registries

Dockerfile Analysis Safety

When parsing Dockerfiles from untrusted sources:

  • Do NOT execute RUN commands — static analysis only
  • Flag FROM images pointing to non-allowlisted registries as warnings
  • Parse multi-stage builds to identify all base images
  • Analyze for best-practice violations using a rule engine (no shell execution)

Sign-Up & Lead Capture

Sign-up is required for advanced scan methods (zip, GitHub, git push) and premium features (saved reports, PDF export, scan history). The Dockerfile+lockfile paste flow works without any account to maximize top-of-funnel reach.

When Sign-Up Is Triggered

  • Before scanning: Prospect selects zip upload, GitHub repo, or git push → redirect to sign-up first
  • After anonymous scan: Inline CTA on results page: “Sign up to save this report and export PDF” → sign-up, then results retroactively saved
  • Organic: Prospect visits sign-up page directly

Sign-Up Flow

Two sign-up paths:

Option A: Email sign-up

  1. Prospect lands on “Test Us” page
  2. Clicks “Get Started” → lightweight sign-up form:
    • Email (required)
    • Name (optional)
    • Company (optional)
    • Marketing consent checkbox (required, pre-checked): “I agree to receive security insights and product updates from CascadeGuard”
  3. Email verification (magic link or 6-digit code)
  4. On verified → redirect to scan input page
  5. User enters drip marketing campaign immediately

Option B: GitHub OAuth sign-up

  1. Prospect clicks “Continue with GitHub”
  2. GitHub OAuth flow — we request read:user and user:email scopes only (no repo access at this stage)
  3. On callback: extract email + name + GitHub username from profile
  4. Marketing consent screen (still required): “I agree to receive security insights and product updates from CascadeGuard”
  5. On consent → redirect to scan input page, user enters drip campaign
  6. Bonus: if they later choose the “GitHub Repository” scan method, we already have their GitHub identity — they only need to grant the additional repo-scoped permission via our GitHub App install

Note: GitHub OAuth for sign-up is separate from the GitHub App installation used for repo scanning (Phase 3). Sign-up uses OAuth for identity only; repo scanning uses a GitHub App with fine-grained contents:read permission that is revoked immediately after cloning.

Drip Campaign Integration

Marketing platform selection is handled as a separate strategic decision. This flow integrates via a marketing event API — an internal abstraction layer so the “Test Us” flow doesn’t couple to a specific platform.

Events emitted to marketing platform:

  • user.signed_up — tags: trial-user, scan-pending, source: test-us
  • scan.completed — tags updated to scan-complete, scan summary attached as contact properties
  • report.viewed — tracks engagement for lead scoring
  • report.pdf_exported — high-intent signal
  • report.expiring — triggers final CTA email

Drip sequence (implemented in marketing platform, triggered by events above):

  • Day 0: report ready notification
  • Day 2: key findings follow-up with remediation tips
  • Day 5: comparison with continuous monitoring (“here’s what you’re missing”)
  • Day 7: report expiring — upgrade CTA

Scan Report

The scan produces a shareable report (unique URL, valid for 7 days) containing:

Report Sections

  1. Summary Card

    • Overall risk score (Critical / High / Medium / Low counts)
    • Base image(s) identified
    • Quick verdict: “X critical vulnerabilities found” or “No critical vulnerabilities”
  2. Vulnerability Table

    • CVE ID, severity, affected package, installed version, fixed version
    • Sorted by severity (critical first)
    • Expandable details per CVE (description, references, exploit availability)
  3. Dockerfile Analysis

    • Best-practice violations (non-root user, multi-stage builds, layer optimization)
    • Security recommendations (pinned versions, minimal base images)
    • Base image freshness (days since last update)
  4. SBOM Preview

    • Package count by ecosystem (OS packages, language deps)
    • License summary
    • Link to download full SBOM (SPDX + CycloneDX formats)
  5. CTA Section

    • “Monitor this image continuously” — leads to paid plan page
    • “Set up automated rebuilds” — leads to CI integration guide
    • Comparison: one-off scan vs. continuous monitoring
  6. PDF Export (download button)

    • Report rendered as a branded PDF designed to be shared with stakeholders
    • Serves as a sales deck: executive summary, risk highlights, remediation roadmap, and a “Why CascadeGuard” section
    • Includes CascadeGuard branding, prospect’s scan data, and a clear CTA to schedule a demo
    • Generated server-side (Workers + a PDF library like @react-pdf/renderer or Puppeteer on fallback VPS)

Report & Results Storage

  • Structured results stored as YAML in R2, keyed by scan-id — this powers:
    • The web report rendering
    • PDF generation
    • Drip campaign personalization (scan summary fields are synced to the marketing platform)
    • Aggregate trend data (opt-in, anonymized)
  • Pre-rendered HTML report also stored in R2 for fast serving
  • Served via Cloudflare Pages/Workers at https://scan.cascadeguard.com/reports/{scan-id}
  • 7-day TTL, then automatically deleted from R2
  • YAML results are retained (associated with the user account) for drip campaign use, even after the public report URL expires
  • No PII stored in the report itself (no GitHub usernames, repo names — unless prospect opts in)

API Design

New Endpoints (under /api/v1/trial)

Anonymous (no auth required, Turnstile token required)

POST   /api/v1/trial/scans/anonymous               # Anonymous Dockerfile + optional lockfiles
  Content-Type: multipart/form-data
  Body: turnstileToken=<token>, dockerfile=<text>, lockfiles[]=@package-lock.json
  Response: { "scanId": "...", "status": "scanning" }
  Note: Results returned inline via polling. No persistent report URL.

GET    /api/v1/trial/scans/{scanId}                # Poll scan status (works for both anon + authed)
  Response: { "status": "queued|scanning|complete|failed", "reportUrl": null|"...", "progress": 0.75 }

GET    /api/v1/trial/scans/{scanId}/report         # Get scan report (anon: ephemeral, no PDF)
  Response: { "summary": {...}, "vulnerabilities": [...], "dockerfile": {...}, "sbom": {...} }

Authenticated (session/JWT required)

POST   /api/v1/trial/scans/dockerfile              # Dockerfile + optional lockfiles (saved)
  Content-Type: multipart/form-data
  Body: dockerfile=<text>, lockfiles[]=@package-lock.json, lockfiles[]=@requirements.txt

POST   /api/v1/trial/scans/upload                  # Upload zip for scanning
  Content-Type: multipart/form-data
  Body: file=@context.zip

POST   /api/v1/trial/scans/github                  # Initiate GitHub repo scan
  Body: { "installationId": "...", "repoFullName": "org/repo" }

POST   /api/v1/trial/scans/git-remote              # Request a temporary git remote
  Response: { "remoteUrl": "https://scan.cascadeguard.com/incoming/{uuid}.git", "expiresAt": "..." }

POST   /api/v1/trial/scans/{scanId}/claim          # Claim an anonymous scan after sign-up
  Body: { "anonymousScanId": "..." }
  Note: Retroactively saves results to user account, enables PDF export + persistent URL

GET    /api/v1/trial/scans/{scanId}/report/pdf     # Download PDF export (authed only)

Rate Limiting

ScopeLimitWindowNotes
Per IP (anonymous scans)3 completed scans24 hoursTurnstile CAPTCHA required. Results shown inline only (no persistent URL, no PDF).
Per user (authenticated scans)1 completed scan24 hoursA “completed scan” means the scan ran to completion. Failed scans due to fundamentally broken uploads (e.g., not a valid Dockerfile, corrupt zip) do NOT count — user can retry with a corrected input. Missing build context is flagged as out-of-scope, not a failure.
Per user (submissions)5 submissions24 hoursPrevents abuse of the retry mechanism
Per IP (sign-up attempts)3 sign-up attempts1 hourPrevents sign-up spam
Global1,000 scans24 hoursCircuit breaker for cost control (includes both anon and authed)

Rate limits enforced via Cloudflare Workers KV with sliding window counters. Anonymous scans keyed by IP (via CF-Connecting-IP); authenticated scans keyed by user ID.

Abuse Prevention

  • CAPTCHA (Cloudflare Turnstile) required for all anonymous scans and at sign-up
  • Anonymous scans: IP-based rate limiting, no persistent storage, results ephemeral
  • Authenticated scans: user-ID-based rate limiting, results stored
  • Automated monitoring for scanning pattern anomalies (same IP/user scanning many different images = potential reconnaissance)
  • Report URLs (authenticated only) are unguessable (128-bit random IDs) — viewable by anyone with the link (so prospects can share with stakeholders for buy-in)

UX Flow

Landing Page → Scan (No Sign-Up for Dockerfile)

┌────────────────────────────────────────────────────────┐
│                                                         │
│   See how secure your container images are.             │
│   Paste a Dockerfile — no sign-up needed.               │
│                                                         │
│   ┌─────────────────────────────────────────────────┐   │
│   │ Paste your Dockerfile here...                    │   │
│   │                                                  │   │
│   │                                                  │   │
│   └─────────────────────────────────────────────────┘   │
│                                                         │
│   + Add lockfiles (optional)                            │
│     [ Drop lockfiles here or click to upload ]          │
│     (package-lock.json, requirements.txt, go.sum, etc.) │
│                                                         │
│   [✓ I'm not a robot (Turnstile)]                       │
│                                                         │
│   [ Scan Now → ]                                        │
│                                                         │
│   ── or sign up for more scan methods ──                │
│                                                         │
│   [ Continue with GitHub ]  [ Sign up with email ]      │
│   (unlocks: zip upload, GitHub repo, git push,          │
│    saved reports, PDF export)                            │
│                                                         │
└────────────────────────────────────────────────────────┘
         │ (anonymous scan)
         ▼
┌────────────────────────────────────────────────────────┐
│                                                         │
│   ✅ Scan Complete — 3 critical, 7 high, 12 medium      │
│                                                         │
│   [Vulnerability table, Dockerfile analysis, SBOM       │
│    preview — all shown inline]                          │
│                                                         │
│   ┌─────────────────────────────────────────────────┐   │
│   │ 🔒 Sign up to save this report & export PDF     │   │
│   │                                                  │   │
│   │ [ Continue with GitHub ]  [ Sign up with email ] │   │
│   │                                                  │   │
│   │ • Save report with a shareable link              │   │
│   │ • Export as a stakeholder-ready PDF               │   │
│   │ • Unlock zip upload, GitHub, and git push scans  │   │
│   │ • Get continuous monitoring for this image        │   │
│   └─────────────────────────────────────────────────┘   │
│                                                         │
└────────────────────────────────────────────────────────┘

Authenticated Scan Flow (zip, GitHub, git push)

┌────────────────────────────────────────────────────┐
│   Sign Up (required for advanced methods)           │
│                                                     │
│   [ Continue with GitHub ]                          │
│                                                     │
│   ── or ──                                          │
│                                                     │
│   Email: [________________________]                 │
│   Name:  [________________________] (optional)      │
│   Company: [______________________] (optional)      │
│                                                     │
│   [x] I agree to receive security insights and      │
│       product updates from CascadeGuard             │
│                                                     │
│   [ Continue → ]                                    │
│                                                     │
└────────────────────────────────────────────────────┘
         │ (verify email / GitHub OAuth)
         ▼
┌────────────────────────────────────────────────────┐
│                                                     │
│   Choose how to scan:                               │
│                                                     │
│   ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────┐ │
│   │Dockerfile│ │Upload Zip│ │  GitHub  │ │ Git  │ │
│   │  Paste   │ │          │ │  Repo    │ │ Push │ │
│   └──────────┘ └──────────┘ └──────────┘ └──────┘ │
│                                                     │
│   (All methods available, results saved + PDF)      │
│                                                     │
└────────────────────────────────────────────────────┘

Progress States

  1. Uploading — “Uploading your artifact…”
  2. Queued — “Your scan is queued (position X)”
  3. Scanning — “Scanning base images…” → “Analyzing vulnerabilities…” → “Generating SBOM…”
  4. Complete — Redirect to report page
  5. Failed — Clear error message with retry option

Target: < 2 minutes from submission to report for Dockerfile paste, < 5 minutes for zip/repo.


Infrastructure

Compute: Cloudflare Workers (Primary)

Why Cloudflare Workers:

  • Keeps everything on one stack — simpler ops, single vendor billing, consistent security model
  • Workers handle Dockerfile parsing, lockfile analysis, vulnerability DB lookups, report rendering, and PDF generation
  • Workers Unbound plan provides 30-second CPU time, sufficient for static analysis workloads
  • Scale-to-zero by default, pay-per-invocation

For Dockerfile + lockfile scanning (Phase 1-2):

  • Pure Workers: parse Dockerfiles, scan lockfiles against vulnerability databases (stored in KV or D1), generate SBOM
  • Vulnerability data: sync CVE databases periodically into D1/KV for fast in-Worker lookups
  • No OCI image pull needed — we analyze the Dockerfile and dependency manifests statically

For OCI image scanning (if needed later):

  • If we need to pull and scan actual container images (layers, OS packages), Workers alone won’t suffice
  • Fallback: single Hetzner VPS (~$5/month) with gVisor isolation per scan, Workers dispatch via authenticated webhook
  • Workers remain the API layer; VPS is a dumb compute backend

Cost estimate: ~5/month flat if OCI scanning is needed.

Storage

ArtifactLocationTTL
Uploaded zip/contextR2 quarantine bucket1 hour
Cloned repositoryWorker ephemeral / VPS tmpfsScan duration only
Scan results (YAML)R2 results bucketRetained (for drip campaign)
Scan results (JSON, for API)R2 results bucket7 days (public URL)
Pre-rendered report (HTML)R2 reports bucket (via Pages proxy)7 days
PDF exportR2 reports bucket7 days
SBOM artifactsR2 reports bucket7 days
User/lead dataD1Retained (marketing)

Queue

  • Cloudflare Queue: trial-scan-jobs
  • Message schema: { scanId, userId, type, artifactLocation, createdAt }
  • Consumer: Cloudflare Worker (or VPS webhook for heavy scans)
  • Dead letter queue after 3 retries

Security Considerations

Threat Model

ThreatMitigation
Zip bomb (gigabytes from small archive)Decompression ratio limit (10:1), file count limit, pre-check compressed vs uncompressed size
Path traversal (../../etc/passwd)Normalize all paths, reject any with .. components, extract to isolated temp dir
Symlink escapeDo not follow symlinks during extraction
Malicious Dockerfile (FROM attacker.registry/exfil)Allowlist public registries only, log and block unknown registries
SSRF via DockerfileNo network access from Dockerfile parser; image pulls are a separate, controlled step
Reconnaissance (scanning public images to map vulnerabilities)Rate limiting, monitoring for patterns, results are not cached across users
DDoS via scan queue floodingCloudflare Turnstile, IP rate limits, queue depth limits
Data exfiltration via report URLReports contain no PII, URLs are unguessable, 7-day TTL

Data Handling

  • No prospect data is retained beyond the 7-day report TTL
  • No prospect data is used for training or analytics without opt-in
  • GDPR-compliant: no cookies required for anonymous scans, data deletion on TTL expiry
  • Scan artifacts are encrypted at rest (R2 server-side encryption)

Vulnerability Data Sources & Update Strategy

Data Sources

We use multiple vulnerability databases to maximize coverage:

SourceCoverageFormatUpdate Frequency
NVD (National Vulnerability Database)All CVEs — the canonical sourceJSON (NVD API 2.0)Incremental sync every 2 hours
OSV (Open Source Vulnerabilities)Language ecosystem packages (npm, PyPI, Go, Cargo, etc.)JSON (OSV API)Incremental sync every 1 hour
GitHub Advisory Database (GHSA)npm, pip, Go, Cargo, Maven, NuGet, pub, RubyGems, ErlangJSON (GitHub GraphQL API)Webhook on new advisory + hourly poll
Alpine SecDBAlpine Linux OS packagesJSON/YAMLDaily sync
Debian Security TrackerDebian OS packagesJSONDaily sync
Ubuntu CVE TrackerUbuntu OS packagesOVAL/JSONDaily sync

How We Keep It Up to Date

Sync architecture (Cloudflare stack):

  1. Scheduled Workers (cron triggers) run on cadence per source:

    • Every 1hr: OSV incremental sync (uses modified_since parameter)
    • Every 2hrs: NVD incremental sync (uses lastModStartDate API param)
    • Every 1hr: GHSA poll (GraphQL securityAdvisories with updatedSince)
    • Daily: OS-level distro trackers (Alpine, Debian, Ubuntu)
  2. Storage: Vulnerability records are normalized into a common schema and stored in D1 (Cloudflare’s edge SQLite). Schema:

    • vulnerability_id (CVE/GHSA/OSV ID)
    • source (nvd, osv, ghsa, alpine, debian, ubuntu)
    • affected_package (name + ecosystem)
    • affected_versions (version ranges)
    • fixed_version (if available)
    • severity (CVSS v3.1 score + qualitative: critical/high/medium/low)
    • description, references, published_at, modified_at
    • exploit_known (boolean, from CISA KEV catalog)
  3. CISA KEV (Known Exploited Vulnerabilities): Synced daily. If a CVE is on KEV, we flag it prominently in reports as “known to be actively exploited.”

  4. Deduplication: NVD is the canonical ID. When the same vulnerability appears in multiple sources, we merge records and prefer the most specific fix-version info (e.g., OSV often has more precise ecosystem-specific ranges than NVD).

  5. Freshness guarantee: At scan time, we check last_sync_at per source. If any source is >4 hours stale, the scan still proceeds but the report includes a notice: “Vulnerability data last updated X hours ago.”

Cost: All sources are free to use. D1 free tier supports up to 5GB which covers several million vulnerability records. NVD API requires an API key (free) for higher rate limits.

Lockfile / Package Scanning

For lockfile analysis, we match each dependency (package, version, ecosystem) against the D1 vulnerability table. This avoids needing to shell out to Grype/Trivy at scan time for lockfile-only scans — the data is already local to the Worker.

For OCI base image scanning (pulling actual layers), we use Grype/Trivy on the fallback VPS if/when that path is enabled. Those tools manage their own vulnerability DB downloads.


Scan Results YAML Schema

The canonical scan result format. Stored in R2 keyed by scans/{scan-id}/result.yaml. Powers the web report, PDF generation, drip campaign personalization, and aggregate analytics.

# CascadeGuard Trial Scan Result Schema v1
version: "1"
 
scan:
  id: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  type: "dockerfile" | "dockerfile_lockfiles" | "zip" | "github" | "git_push"
  status: "complete" | "failed" | "partial"
  created_at: "2026-04-02T12:00:00Z"          # ISO 8601
  completed_at: "2026-04-02T12:01:23Z"
  duration_ms: 83000
  anonymous: true | false                       # whether scan was run without auth
  user_id: null | "usr_..."                     # null if anonymous
 
input:
  dockerfile_present: true
  dockerfile_size_bytes: 1842
  lockfiles:                                    # empty array if none provided
    - filename: "package-lock.json"
      ecosystem: "npm"
      size_bytes: 245000
      dependency_count: 312
    - filename: "requirements.txt"
      ecosystem: "pip"
      size_bytes: 1200
      dependency_count: 45
 
base_images:
  - reference: "node:20-slim"
    registry: "docker.io"
    digest: "sha256:abc123..."
    os: "debian"
    os_version: "bookworm"
    last_updated: "2026-03-15T00:00:00Z"
    days_since_update: 18
 
summary:
  risk_level: "critical" | "high" | "medium" | "low" | "none"
  vulnerability_counts:
    critical: 3
    high: 7
    medium: 12
    low: 22
    negligible: 5
  total_vulnerabilities: 49
  fixable_vulnerabilities: 31                   # have a known fixed version
  exploit_known_count: 2                        # on CISA KEV list
  dockerfile_issues_count: 4
  package_count: 312
  ecosystem_breakdown:
    os: 89                                      # OS-level packages from base image
    npm: 312
    pip: 45
 
vulnerabilities:
  - id: "CVE-2026-1234"
    source: "nvd"                               # nvd | osv | ghsa | alpine | debian | ubuntu
    severity: "critical"
    cvss_score: 9.8
    cvss_vector: "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H"
    package: "openssl"
    ecosystem: "os"                             # os | npm | pip | go | cargo | gem | composer | maven
    installed_version: "3.0.2"
    fixed_version: "3.0.14"                     # null if no fix available
    fixable: true
    exploit_known: true                         # appears in CISA KEV
    title: "OpenSSL buffer overflow in X.509 certificate verification"
    description: "A buffer overrun can be triggered..."
    references:
      - "https://nvd.nist.gov/vuln/detail/CVE-2026-1234"
      - "https://www.openssl.org/news/secadv/20260301.txt"
    published_at: "2026-03-01T00:00:00Z"
    modified_at: "2026-03-15T00:00:00Z"
 
dockerfile_analysis:
  issues:
    - rule: "no-root-user"
      severity: "high"
      line: 15
      message: "Container runs as root. Add a USER directive."
      recommendation: "Add 'RUN addgroup -S app && adduser -S app -G app' and 'USER app'"
    - rule: "unpinned-base-image"
      severity: "medium"
      line: 1
      message: "Base image 'node:20-slim' uses a mutable tag."
      recommendation: "Pin to a specific digest: node:20-slim@sha256:abc123..."
    - rule: "missing-healthcheck"
      severity: "low"
      line: null
      message: "No HEALTHCHECK instruction found."
      recommendation: "Add HEALTHCHECK CMD curl -f http://localhost:3000/ || exit 1"
    - rule: "unnecessary-layer"
      severity: "info"
      line: 8
      message: "Multiple consecutive RUN instructions could be combined."
      recommendation: "Chain with && to reduce image layers."
  stages:                                       # multi-stage build info
    - name: "builder"
      base_image: "node:20-slim"
      line: 1
    - name: "runtime"
      base_image: "node:20-slim"
      line: 22
 
sbom:
  format: "cyclonedx"                           # cyclonedx | spdx
  component_count: 312
  license_summary:
    MIT: 180
    Apache-2.0: 65
    ISC: 30
    BSD-3-Clause: 20
    GPL-3.0-only: 2                             # flagged in report
    unknown: 15
  download_urls:
    cyclonedx: "/api/v1/trial/scans/{scanId}/sbom?format=cyclonedx"
    spdx: "/api/v1/trial/scans/{scanId}/sbom?format=spdx"
 
data_freshness:
  nvd_last_sync: "2026-04-02T10:00:00Z"
  osv_last_sync: "2026-04-02T11:00:00Z"
  ghsa_last_sync: "2026-04-02T11:00:00Z"
  alpine_last_sync: "2026-04-02T06:00:00Z"
  debian_last_sync: "2026-04-02T06:00:00Z"
  ubuntu_last_sync: "2026-04-02T06:00:00Z"
  cisa_kev_last_sync: "2026-04-02T06:00:00Z"
  stale_sources: []                             # list of sources >4hrs stale, shown as notice in report

Schema Notes

  • version: Schema version for forward compatibility. Consumers check this before parsing.
  • vulnerabilities: Sorted by severity (critical first), then by exploit_known (true first). Deduped by CVE ID; when multiple sources report the same CVE, we merge and prefer the most specific fix-version info.
  • ecosystem: Normalized ecosystem identifier. os = OS-level packages from the base image scan. Language ecosystems match the lockfile source.
  • dockerfile_analysis.issues[].rule: Machine-readable rule ID. Rules are defined in a separate configuration and can be extended.
  • sbom: The full SBOM is stored separately as a downloadable artifact. The YAML result contains only the summary and download URLs.
  • data_freshness: Included so the report can display a notice if vulnerability data is stale. Consumer should check stale_sources and surface a warning banner.

Implementation Phases

Phase 1: Anonymous Dockerfile Scan + Vuln DB (1–2 weeks)

  • Build the “Test Us” landing page with Dockerfile paste + lockfile upload — no sign-up required
  • Turnstile CAPTCHA for anonymous scans, IP-based rate limiting (3/day)
  • Implement anonymous scan Worker endpoint
  • Build vulnerability DB sync pipeline — scheduled Workers syncing NVD, OSV, GHSA, and distro trackers into D1
  • Build inline scan results rendering (vulnerability table, Dockerfile analysis, SBOM preview)
  • Build sign-up flow (two paths: email + GitHub OAuth) with marketing consent
  • Anonymous scan claim endpoint (retroactively saves results on sign-up)
  • PDF export for authenticated users (stakeholder-ready sales deck format)
  • Emit marketing events via internal abstraction layer (platform TBD as strategic decision)
  • Rate limiting: 3 anon scans/IP/day, 1 completed authed scan/user/day

Delivers: Prospects paste a Dockerfile (+ optional lockfiles) and immediately see vulnerability results with zero sign-up. Sign-up unlocks saved reports, PDF export, and drip campaign enrollment.

Phase 2: Zip Upload (1 week)

  • Add multipart upload endpoint
  • Implement quarantined archive extraction (path traversal protection, size limits)
  • Extend scanner to analyze lockfiles found in archive
  • Add lockfile vulnerability scanning
  • Clear error messaging for invalid uploads (vs. out-of-scope missing context)

Delivers: Prospects can upload a project zip and get a comprehensive scan including dependency analysis.

  • Build GitHub App for temporary repo access
  • Implement OAuth flow with immediate token revocation
  • Shallow clone + scan pipeline
  • Repository size validation

Delivers: Prospects can link a GitHub repo and get scanned without manually packaging anything.

Phase 4: Git Push (1 week)

  • Deploy lightweight Git HTTP backend (smart protocol) on CF Worker + R2
  • Implement temporary remote URL generation (UUID-as-auth, 15-min TTL)
  • Connect push receipt to scan queue

Delivers: Prospects can git push their project to get scanned — appealing to CLI-first users.

Phase 5: Drip Campaign Polish + Conversion (1 week)

  • A/B test CTAs on report page and in drip emails
  • Add “Compare: one-off vs. continuous” section to report
  • Drip email: report expiry notification (“Your report expires in 2 days — upgrade to keep monitoring”)
  • Personalized drip emails using YAML scan data (e.g., “We found X critical CVEs in your base image — here’s how continuous monitoring catches new ones”)
  • Analytics: track funnel from sign-up → scan → report view → PDF download → paid conversion

Success Metrics

MetricTarget (3 months post-launch)
Trial scans per week500+
Scan completion rate> 90%
Median time to report< 2 min (Dockerfile), < 5 min (zip/repo)
Trial → sign-up conversion> 15%
Sign-up → CI integration> 40%

Open Questions

  1. Should we support pre-built OCI image scanning? Prospect provides docker.io/library/nginx:latest and we pull + scan it directly. This is the simplest input method but potentially expensive (pulling large images) and could be used for reconnaissance. Consider adding as Phase 0 if cost is manageable.

  2. Report sharing — public or private by default? Current proposal: unguessable URL, no auth required. Alternative: require email to view report (captures lead but adds friction).

  3. Fly.io vs. self-hosted Firecracker on HetznerResolved: Cloudflare Workers as primary, Hetzner VPS ($5/mo) as fallback for OCI image scanning only.

  4. Should trial scan results feed into our aggregate vulnerability database?Resolved: Yes — YAML scan results retained for drip campaign personalization and aggregate stats. Consent captured at sign-up.

Resolved Questions

  1. Report sharing — public or private?Unguessable URL, no auth to view — prospects share the PDF/link with stakeholders as a sales deck.
  2. Git push authenticationUUID in the URL as a short-lived shared key (15-min TTL, tied to authenticated user session).
  3. Rate limiting1 completed scan per user per 24 hours, retries allowed for broken uploads. Missing build context = out-of-scope flag, not a failure.

Remaining Open Questions

  1. Marketing platform choiceDeferred: handled as a separate strategic decision. This PRD integrates via an internal marketing event API abstraction.

  2. PDF generation approach@react-pdf/renderer in Workers (lightweight) vs. Puppeteer on fallback VPS (higher fidelity). Depends on report design requirements.

  3. YAML schema for scan resultsResolved: defined in this PRD (see “Scan Results YAML Schema” section above).

  4. D1 row limits at scale — NVD alone has 200K+ CVEs. D1 free tier is 5GB which should suffice, but if scan throughput grows we may need to shard or move hot-path lookups to KV. Monitor after launch.