PRD: “Test Us” Flow — One-Off Prospect Scanning

Status: Draft Author: CTO Agent Date: 2026-04-02 Issue: CAS-25

Problem

Prospects evaluating CascadeGuard have no way to see the product work on their images before committing to sign-up and CI integration. The existing onboarding requires GitHub SSO, API key setup, and installing reusable workflows — too much friction for a “just show me” moment.

We need a lightweight “Test Us” flow that lets a prospect submit a container image artifact, see a real scan report, and understand the value CascadeGuard provides — all within minutes.

Two-tier model: The simplest input methods (Dockerfile paste and lockfile upload) work without any sign-up at all — zero friction, instant results. This gives the prospect an immediate “wow” moment. More advanced methods (zip upload, GitHub repo, git push) and full features (PDF export, saved history, drip campaign) require sign-up with marketing consent.

Goals

Zero-friction first touch — prospect gets scan results from a Dockerfile paste without any sign-up at all
Demonstrate real value — show actual vulnerabilities, SBOM, and remediation guidance on their own image
Protect our infrastructure — all prospect-supplied artifacts are treated as untrusted and processed in full isolation
Drive conversion — scan results naturally lead to sign-up and “monitor this image continuously” CTA
Build marketing pipeline — signed-up users enter a drip campaign; scan results (stored as YAML) power personalized follow-up emails
Arm the champion — PDF export (sign-up required) serves as a stakeholder-ready sales deck for why the prospect’s org should buy CascadeGuard

Non-Goals

Replacing the full CI-integrated scanning pipeline (this is a one-shot preview)
Supporting private registry pulls (prospect must supply the artifact directly)
Persistent image monitoring (results expire after 7 days)
Paid feature gating (this is a free top-of-funnel tool)
Choosing a marketing platform (handled as a separate strategic decision)

Input Methods

Flow: Prospect pastes Dockerfile text into a web editor, and optionally uploads zero or more lockfiles alongside. No account needed — this is the zero-friction entry point.

Assumes no build context (no COPY/ADD from local filesystem will resolve)
We parse the Dockerfile to extract base image references
We scan the referenced base images (pull from public registries only)
We analyze the Dockerfile itself for best-practice violations (running as root, missing healthcheck, pinning issues, etc.)
If lockfiles are uploaded, we scan them for known vulnerabilities (same analysis as the zip flow)

Supported lockfiles: package-lock.json, yarn.lock, pnpm-lock.yaml, requirements.txt, Pipfile.lock, poetry.lock, go.sum, Gemfile.lock, Cargo.lock, composer.lock, pom.xml, gradle.lockfile, and similar dependency manifests.

Constraints:

Max 100KB Dockerfile size
Max 10 lockfiles, each up to 5MB
Only public base images are pullable
COPY/ADD instructions are flagged but cannot be resolved (no build context)

Anonymous scan limits:

Cloudflare Turnstile CAPTCHA required (prevents bot abuse without requiring sign-up)
Rate limited by IP: 3 scans per IP per 24 hours
Results shown inline on the page but NOT saved — no persistent report URL, no PDF export
Clear CTA after results: “Sign up to save this report, export PDF, and get continuous monitoring”
If the prospect signs up after scanning, the current scan results are retroactively saved to their account

Flow: Prospect uploads a zip archive containing a Dockerfile and optionally build context.

We extract and inspect the archive in a quarantined environment
We do NOT build the image — we analyze the Dockerfile and any lockfiles/manifests found in the archive
Lockfiles (package-lock.json, requirements.txt, go.sum, Gemfile.lock, etc.) are scanned for known vulnerabilities
Base images referenced in the Dockerfile are pulled and scanned

Constraints:

Max 50MB upload size
Archive must contain a Dockerfile at root or in a clearly named subdirectory
No executable content is run from the archive — static analysis only
Archive is virus-scanned before extraction

Flow: Prospect authorizes temporary read access to a repository via GitHub OAuth.

Prospect clicks “Connect GitHub” and selects a repository
We use a GitHub App installation with fine-grained, repository-scoped permissions (contents: read only)
We clone the repository into a quarantined environment
We locate Dockerfiles and lockfiles, perform the same analysis as the zip flow
Access is immediately revoked after cloning completes (uninstall the app installation or revoke the token)
Total access window: < 60 seconds

Implementation:

GitHub App with repository:contents:read permission, configured for user-initiated install
On callback: clone repo, revoke access token, process offline
Alternatively: use a short-lived fine-grained personal access token via OAuth device flow

Constraints:

Public repositories only for unauthenticated flow; private repos require GitHub OAuth
Access token TTL: 60 seconds max, revoked immediately after clone
Repository size limit: 500MB
We clone only the default branch HEAD (shallow clone, depth=1)

Flow: Prospect pushes to a temporary Git remote we provide.

On the “Test Us” page (post sign-up), we generate a unique temporary remote URL: https://scan.cascadeguard.com/incoming/{uuid}.git
The UUID in the URL acts as a short-lived shared key — it authenticates the push without requiring separate credentials
Prospect runs git remote add cascadeguard <url> && git push cascadeguard HEAD
We receive the push, extract the repository content, and process it
The remote is destroyed after processing (or after 15 minutes TTL, whichever comes first)

Implementation:

Lightweight Git HTTP backend (smart protocol) running as a Cloudflare Worker + R2 for pack storage
Push triggers a queue message to kick off analysis
No additional authentication required — the UUID in the URL is the credential (cryptographically random, 128-bit, time-limited)

Constraints:

Max push size: 100MB
Only one push per URL (subsequent pushes rejected)
Remote URL expires after 15 minutes regardless of use (short-lived to minimize exposure)
UUID is cryptographically random (128-bit)
Tied to the authenticated user’s session — cannot be reused by another account

Quarantined Processing Environment

All prospect-supplied input is untrusted. We must protect CascadeGuard infrastructure from:

Malicious archive contents (zip bombs, symlink attacks, path traversal)
Malicious Dockerfiles (attempts to exfiltrate via FROM with attacker-controlled registries)
Oversized inputs designed to exhaust resources
Supply chain attacks embedded in lockfiles or manifests

Architecture

Why Cloudflare Workers (not Fly.io): We keep everything on the Cloudflare stack to minimize operational complexity, reduce vendor count, and stay within our existing billing/security posture. Workers can handle Dockerfile parsing, lockfile analysis, and coordinating vulnerability DB lookups. For the heavier OCI image pull + scan workloads, we use Cloudflare Workers with R2 streaming and a pre-compiled WASM scanner, or defer to a lightweight container-based Worker (Workers for Platforms / Container Workers — currently in beta) if pure Workers can’t meet memory requirements. If container scanning proves impossible on Workers alone, we fall back to a $5/month Hetzner VPS with gVisor isolation as the minimal external compute.

Prospect Browser
      │
      ▼
┌──────────────────────┐     ┌──────────────────────────┐
│  CF Workers            │────▶│  R2 (Quarantine Bucket)   │
│  (API + Auth + Scan)   │     │  - Uploaded artifacts      │
│                        │     │  - Isolated per scan-id    │
│  - Sign-up / auth      │     └──────────┬─────────────────┘
│  - Dockerfile parse    │                 │
│  - Lockfile analysis   │                 │
│  - Vuln DB lookup      │     ┌───────────▼────────────────┐
│  - SBOM generation     │     │  R2 (Results Bucket)        │
│  - Report render       │     │  - Scan results (YAML+JSON) │
│  - PDF generation      │     │  - Pre-rendered HTML report  │
└──────────┬─────────────┘     │  - PDF exports               │
           │                    │  - SBOM artifacts            │
           ▼                    └───────────────────────────────┘
┌──────────────────────┐
│  CF Queue             │
│  (scan-jobs)          │
│  - Dead letter after  │
│    3 retries          │
└───────────────────────┘

Fallback for heavy OCI scanning (if CF Workers can’t handle it):

Single Hetzner VPS (~$5/month) with gVisor sandbox per scan
Workers dispatch scan jobs to VPS via authenticated webhook
VPS pulls OCI images, runs Grype/Trivy/Syft, POSTs results back to Workers
No persistent state on VPS — ephemeral per scan

Isolation Requirements

Layer	Control	Purpose
Upload validation	Size limits, content-type checking, magic-byte validation	Reject obviously malicious inputs early
Archive extraction	Custom extractor with: path traversal prevention, symlink resolution blocking, file count limits (max 10,000 files), decompression bomb detection (max 10:1 ratio)	Prevent zip bomb and path traversal attacks
Compute isolation	Each scan runs in an isolated Workers invocation (V8 isolate) or gVisor sandbox on fallback VPS	Process-level isolation; no shared filesystem or network namespace between scans
Network isolation	Egress allowlist: Docker Hub, GHCR, Quay.io registries only. No access to CascadeGuard internal services	Prevent exfiltration and SSRF
Resource limits	2 vCPU, 2GB RAM, 10 min timeout per scan	Prevent resource exhaustion
Artifact lifecycle	All scan artifacts deleted after results are extracted. R2 quarantine objects TTL: 1 hour	Minimize attack surface window
Registry pull safety	Only pull FROM images matching a public registry allowlist. Block `FROM` directives pointing to private/unknown registries	Prevent information leakage via controlled registries

Dockerfile Analysis Safety

When parsing Dockerfiles from untrusted sources:

Do NOT execute RUN commands — static analysis only
Flag FROM images pointing to non-allowlisted registries as warnings
Parse multi-stage builds to identify all base images
Analyze for best-practice violations using a rule engine (no shell execution)

Sign-up is required for advanced scan methods (zip, GitHub, git push) and premium features (saved reports, PDF export, scan history). The Dockerfile+lockfile paste flow works without any account to maximize top-of-funnel reach.

Before scanning: Prospect selects zip upload, GitHub repo, or git push → redirect to sign-up first
After anonymous scan: Inline CTA on results page: “Sign up to save this report and export PDF” → sign-up, then results retroactively saved
Organic: Prospect visits sign-up page directly

Two sign-up paths:

Option A: Email sign-up

Prospect lands on “Test Us” page
Clicks “Get Started” → lightweight sign-up form:
- Email (required)
- Name (optional)
- Company (optional)
- Marketing consent checkbox (required, pre-checked): “I agree to receive security insights and product updates from CascadeGuard”
Email verification (magic link or 6-digit code)
On verified → redirect to scan input page
User enters drip marketing campaign immediately

Option B: GitHub OAuth sign-up

Prospect clicks “Continue with GitHub”
GitHub OAuth flow — we request read:user and user:email scopes only (no repo access at this stage)
On callback: extract email + name + GitHub username from profile
Marketing consent screen (still required): “I agree to receive security insights and product updates from CascadeGuard”
On consent → redirect to scan input page, user enters drip campaign
Bonus: if they later choose the “GitHub Repository” scan method, we already have their GitHub identity — they only need to grant the additional repo-scoped permission via our GitHub App install

Note: GitHub OAuth for sign-up is separate from the GitHub App installation used for repo scanning (Phase 3). Sign-up uses OAuth for identity only; repo scanning uses a GitHub App with fine-grained contents:read permission that is revoked immediately after cloning.

Drip Campaign Integration

Marketing platform selection is handled as a separate strategic decision. This flow integrates via a marketing event API — an internal abstraction layer so the “Test Us” flow doesn’t couple to a specific platform.

Events emitted to marketing platform:

user.signed_up — tags: trial-user, scan-pending, source: test-us
scan.completed — tags updated to scan-complete, scan summary attached as contact properties
report.viewed — tracks engagement for lead scoring
report.pdf_exported — high-intent signal
report.expiring — triggers final CTA email

Drip sequence (implemented in marketing platform, triggered by events above):

Day 0: report ready notification
Day 2: key findings follow-up with remediation tips
Day 5: comparison with continuous monitoring (“here’s what you’re missing”)
Day 7: report expiring — upgrade CTA

Scan Report

The scan produces a shareable report (unique URL, valid for 7 days) containing:

Report Sections

Summary Card
- Overall risk score (Critical / High / Medium / Low counts)
- Base image(s) identified
- Quick verdict: “X critical vulnerabilities found” or “No critical vulnerabilities”
Vulnerability Table
- CVE ID, severity, affected package, installed version, fixed version
- Sorted by severity (critical first)
- Expandable details per CVE (description, references, exploit availability)
Dockerfile Analysis
- Best-practice violations (non-root user, multi-stage builds, layer optimization)
- Security recommendations (pinned versions, minimal base images)
- Base image freshness (days since last update)
SBOM Preview
- Package count by ecosystem (OS packages, language deps)
- License summary
- Link to download full SBOM (SPDX + CycloneDX formats)
CTA Section
- “Monitor this image continuously” — leads to paid plan page
- “Set up automated rebuilds” — leads to CI integration guide
- Comparison: one-off scan vs. continuous monitoring
PDF Export (download button)
- Report rendered as a branded PDF designed to be shared with stakeholders
- Serves as a sales deck: executive summary, risk highlights, remediation roadmap, and a “Why CascadeGuard” section
- Includes CascadeGuard branding, prospect’s scan data, and a clear CTA to schedule a demo
- Generated server-side (Workers + a PDF library like @react-pdf/renderer or Puppeteer on fallback VPS)

Report & Results Storage

Structured results stored as YAML in R2, keyed by scan-id — this powers:
- The web report rendering
- PDF generation
- Drip campaign personalization (scan summary fields are synced to the marketing platform)
- Aggregate trend data (opt-in, anonymized)
Pre-rendered HTML report also stored in R2 for fast serving
Served via Cloudflare Pages/Workers at https://scan.cascadeguard.com/reports/{scan-id}
7-day TTL, then automatically deleted from R2
YAML results are retained (associated with the user account) for drip campaign use, even after the public report URL expires
No PII stored in the report itself (no GitHub usernames, repo names — unless prospect opts in)

API Design

New Endpoints (under `/api/v1/trial`)

Anonymous (no auth required, Turnstile token required)

POST   /api/v1/trial/scans/anonymous               # Anonymous Dockerfile + optional lockfiles
  Content-Type: multipart/form-data
  Body: turnstileToken=<token>, dockerfile=<text>, lockfiles[]=@package-lock.json
  Response: { "scanId": "...", "status": "scanning" }
  Note: Results returned inline via polling. No persistent report URL.

GET    /api/v1/trial/scans/{scanId}                # Poll scan status (works for both anon + authed)
  Response: { "status": "queued|scanning|complete|failed", "reportUrl": null|"...", "progress": 0.75 }

GET    /api/v1/trial/scans/{scanId}/report         # Get scan report (anon: ephemeral, no PDF)
  Response: { "summary": {...}, "vulnerabilities": [...], "dockerfile": {...}, "sbom": {...} }

Authenticated (session/JWT required)

POST   /api/v1/trial/scans/dockerfile              # Dockerfile + optional lockfiles (saved)
  Content-Type: multipart/form-data
  Body: dockerfile=<text>, lockfiles[]=@package-lock.json, lockfiles[]=@requirements.txt

POST   /api/v1/trial/scans/upload                  # Upload zip for scanning
  Content-Type: multipart/form-data
  Body: file=@context.zip

POST   /api/v1/trial/scans/github                  # Initiate GitHub repo scan
  Body: { "installationId": "...", "repoFullName": "org/repo" }

POST   /api/v1/trial/scans/git-remote              # Request a temporary git remote
  Response: { "remoteUrl": "https://scan.cascadeguard.com/incoming/{uuid}.git", "expiresAt": "..." }

POST   /api/v1/trial/scans/{scanId}/claim          # Claim an anonymous scan after sign-up
  Body: { "anonymousScanId": "..." }
  Note: Retroactively saves results to user account, enables PDF export + persistent URL

GET    /api/v1/trial/scans/{scanId}/report/pdf     # Download PDF export (authed only)

Rate Limiting

Scope	Limit	Window	Notes
Per IP (anonymous scans)	3 completed scans	24 hours	Turnstile CAPTCHA required. Results shown inline only (no persistent URL, no PDF).
Per user (authenticated scans)	1 completed scan	24 hours	A “completed scan” means the scan ran to completion. Failed scans due to fundamentally broken uploads (e.g., not a valid Dockerfile, corrupt zip) do NOT count — user can retry with a corrected input. Missing build context is flagged as out-of-scope, not a failure.
Per user (submissions)	5 submissions	24 hours	Prevents abuse of the retry mechanism
Per IP (sign-up attempts)	3 sign-up attempts	1 hour	Prevents sign-up spam
Global	1,000 scans	24 hours	Circuit breaker for cost control (includes both anon and authed)

Rate limits enforced via Cloudflare Workers KV with sliding window counters. Anonymous scans keyed by IP (via CF-Connecting-IP); authenticated scans keyed by user ID.

Abuse Prevention

CAPTCHA (Cloudflare Turnstile) required for all anonymous scans and at sign-up
Anonymous scans: IP-based rate limiting, no persistent storage, results ephemeral
Authenticated scans: user-ID-based rate limiting, results stored
Automated monitoring for scanning pattern anomalies (same IP/user scanning many different images = potential reconnaissance)
Report URLs (authenticated only) are unguessable (128-bit random IDs) — viewable by anyone with the link (so prospects can share with stakeholders for buy-in)

UX Flow

┌────────────────────────────────────────────────────────┐
│                                                         │
│   See how secure your container images are.             │
│   Paste a Dockerfile — no sign-up needed.               │
│                                                         │
│   ┌─────────────────────────────────────────────────┐   │
│   │ Paste your Dockerfile here...                    │   │
│   │                                                  │   │
│   │                                                  │   │
│   └─────────────────────────────────────────────────┘   │
│                                                         │
│   + Add lockfiles (optional)                            │
│     [ Drop lockfiles here or click to upload ]          │
│     (package-lock.json, requirements.txt, go.sum, etc.) │
│                                                         │
│   [✓ I'm not a robot (Turnstile)]                       │
│                                                         │
│   [ Scan Now → ]                                        │
│                                                         │
│   ── or sign up for more scan methods ──                │
│                                                         │
│   [ Continue with GitHub ]  [ Sign up with email ]      │
│   (unlocks: zip upload, GitHub repo, git push,          │
│    saved reports, PDF export)                            │
│                                                         │
└────────────────────────────────────────────────────────┘
         │ (anonymous scan)
         ▼
┌────────────────────────────────────────────────────────┐
│                                                         │
│   ✅ Scan Complete — 3 critical, 7 high, 12 medium      │
│                                                         │
│   [Vulnerability table, Dockerfile analysis, SBOM       │
│    preview — all shown inline]                          │
│                                                         │
│   ┌─────────────────────────────────────────────────┐   │
│   │ 🔒 Sign up to save this report & export PDF     │   │
│   │                                                  │   │
│   │ [ Continue with GitHub ]  [ Sign up with email ] │   │
│   │                                                  │   │
│   │ • Save report with a shareable link              │   │
│   │ • Export as a stakeholder-ready PDF               │   │
│   │ • Unlock zip upload, GitHub, and git push scans  │   │
│   │ • Get continuous monitoring for this image        │   │
│   └─────────────────────────────────────────────────┘   │
│                                                         │
└────────────────────────────────────────────────────────┘

Authenticated Scan Flow (zip, GitHub, git push)

┌────────────────────────────────────────────────────┐
│   Sign Up (required for advanced methods)           │
│                                                     │
│   [ Continue with GitHub ]                          │
│                                                     │
│   ── or ──                                          │
│                                                     │
│   Email: [________________________]                 │
│   Name:  [________________________] (optional)      │
│   Company: [______________________] (optional)      │
│                                                     │
│   [x] I agree to receive security insights and      │
│       product updates from CascadeGuard             │
│                                                     │
│   [ Continue → ]                                    │
│                                                     │
└────────────────────────────────────────────────────┘
         │ (verify email / GitHub OAuth)
         ▼
┌────────────────────────────────────────────────────┐
│                                                     │
│   Choose how to scan:                               │
│                                                     │
│   ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────┐ │
│   │Dockerfile│ │Upload Zip│ │  GitHub  │ │ Git  │ │
│   │  Paste   │ │          │ │  Repo    │ │ Push │ │
│   └──────────┘ └──────────┘ └──────────┘ └──────┘ │
│                                                     │
│   (All methods available, results saved + PDF)      │
│                                                     │
└────────────────────────────────────────────────────┘

Progress States

Uploading — “Uploading your artifact…”
Queued — “Your scan is queued (position X)”
Scanning — “Scanning base images…” → “Analyzing vulnerabilities…” → “Generating SBOM…”
Complete — Redirect to report page
Failed — Clear error message with retry option

Target: < 2 minutes from submission to report for Dockerfile paste, < 5 minutes for zip/repo.

Infrastructure

Compute: Cloudflare Workers (Primary)

Why Cloudflare Workers:

Keeps everything on one stack — simpler ops, single vendor billing, consistent security model
Workers handle Dockerfile parsing, lockfile analysis, vulnerability DB lookups, report rendering, and PDF generation
Workers Unbound plan provides 30-second CPU time, sufficient for static analysis workloads
Scale-to-zero by default, pay-per-invocation

For Dockerfile + lockfile scanning (Phase 1-2):

Pure Workers: parse Dockerfiles, scan lockfiles against vulnerability databases (stored in KV or D1), generate SBOM
Vulnerability data: sync CVE databases periodically into D1/KV for fast in-Worker lookups
No OCI image pull needed — we analyze the Dockerfile and dependency manifests statically

For OCI image scanning (if needed later):

If we need to pull and scan actual container images (layers, OS packages), Workers alone won’t suffice
Fallback: single Hetzner VPS (~$5/month) with gVisor isolation per scan, Workers dispatch via authenticated webhook
Workers remain the API layer; VPS is a dumb compute backend

Cost estimate: ~ $0.50/ d a y f or 1, 000 Doc k er f i l e / l oc k f i l esc an so nW or k ers . He t z n er V PS a dd s$ 5/month flat if OCI scanning is needed.

Storage

Artifact	Location	TTL
Uploaded zip/context	R2 quarantine bucket	1 hour
Cloned repository	Worker ephemeral / VPS tmpfs	Scan duration only
Scan results (YAML)	R2 results bucket	Retained (for drip campaign)
Scan results (JSON, for API)	R2 results bucket	7 days (public URL)
Pre-rendered report (HTML)	R2 reports bucket (via Pages proxy)	7 days
PDF export	R2 reports bucket	7 days
SBOM artifacts	R2 reports bucket	7 days
User/lead data	D1	Retained (marketing)

Queue

Cloudflare Queue: trial-scan-jobs
Message schema: { scanId, userId, type, artifactLocation, createdAt }
Consumer: Cloudflare Worker (or VPS webhook for heavy scans)
Dead letter queue after 3 retries

Security Considerations

Threat Model

Threat	Mitigation
Zip bomb (gigabytes from small archive)	Decompression ratio limit (10:1), file count limit, pre-check compressed vs uncompressed size
Path traversal (../../etc/passwd)	Normalize all paths, reject any with `..` components, extract to isolated temp dir
Symlink escape	Do not follow symlinks during extraction
Malicious Dockerfile (FROM attacker.registry/exfil)	Allowlist public registries only, log and block unknown registries
SSRF via Dockerfile	No network access from Dockerfile parser; image pulls are a separate, controlled step
Reconnaissance (scanning public images to map vulnerabilities)	Rate limiting, monitoring for patterns, results are not cached across users
DDoS via scan queue flooding	Cloudflare Turnstile, IP rate limits, queue depth limits
Data exfiltration via report URL	Reports contain no PII, URLs are unguessable, 7-day TTL

Data Handling

No prospect data is retained beyond the 7-day report TTL
No prospect data is used for training or analytics without opt-in
GDPR-compliant: no cookies required for anonymous scans, data deletion on TTL expiry
Scan artifacts are encrypted at rest (R2 server-side encryption)

Vulnerability Data Sources & Update Strategy

Data Sources

We use multiple vulnerability databases to maximize coverage:

Source	Coverage	Format	Update Frequency
NVD (National Vulnerability Database)	All CVEs — the canonical source	JSON (NVD API 2.0)	Incremental sync every 2 hours
OSV (Open Source Vulnerabilities)	Language ecosystem packages (npm, PyPI, Go, Cargo, etc.)	JSON (OSV API)	Incremental sync every 1 hour
GitHub Advisory Database (GHSA)	npm, pip, Go, Cargo, Maven, NuGet, pub, RubyGems, Erlang	JSON (GitHub GraphQL API)	Webhook on new advisory + hourly poll
Alpine SecDB	Alpine Linux OS packages	JSON/YAML	Daily sync
Debian Security Tracker	Debian OS packages	JSON	Daily sync
Ubuntu CVE Tracker	Ubuntu OS packages	OVAL/JSON	Daily sync

How We Keep It Up to Date

Sync architecture (Cloudflare stack):

Scheduled Workers (cron triggers) run on cadence per source:
- Every 1hr: OSV incremental sync (uses modified_since parameter)
- Every 2hrs: NVD incremental sync (uses lastModStartDate API param)
- Every 1hr: GHSA poll (GraphQL securityAdvisories with updatedSince)
- Daily: OS-level distro trackers (Alpine, Debian, Ubuntu)
Storage: Vulnerability records are normalized into a common schema and stored in D1 (Cloudflare’s edge SQLite). Schema:
- vulnerability_id (CVE/GHSA/OSV ID)
- source (nvd, osv, ghsa, alpine, debian, ubuntu)
- affected_package (name + ecosystem)
- affected_versions (version ranges)
- fixed_version (if available)
- severity (CVSS v3.1 score + qualitative: critical/high/medium/low)
- description, references, published_at, modified_at
- exploit_known (boolean, from CISA KEV catalog)
CISA KEV (Known Exploited Vulnerabilities): Synced daily. If a CVE is on KEV, we flag it prominently in reports as “known to be actively exploited.”
Deduplication: NVD is the canonical ID. When the same vulnerability appears in multiple sources, we merge records and prefer the most specific fix-version info (e.g., OSV often has more precise ecosystem-specific ranges than NVD).
Freshness guarantee: At scan time, we check last_sync_at per source. If any source is >4 hours stale, the scan still proceeds but the report includes a notice: “Vulnerability data last updated X hours ago.”

Cost: All sources are free to use. D1 free tier supports up to 5GB which covers several million vulnerability records. NVD API requires an API key (free) for higher rate limits.

Lockfile / Package Scanning

For lockfile analysis, we match each dependency (package, version, ecosystem) against the D1 vulnerability table. This avoids needing to shell out to Grype/Trivy at scan time for lockfile-only scans — the data is already local to the Worker.

For OCI base image scanning (pulling actual layers), we use Grype/Trivy on the fallback VPS if/when that path is enabled. Those tools manage their own vulnerability DB downloads.

Scan Results YAML Schema

The canonical scan result format. Stored in R2 keyed by scans/{scan-id}/result.yaml. Powers the web report, PDF generation, drip campaign personalization, and aggregate analytics.

# CascadeGuard Trial Scan Result Schema v1
version: "1"
 
scan:
  id: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  type: "dockerfile" | "dockerfile_lockfiles" | "zip" | "github" | "git_push"
  status: "complete" | "failed" | "partial"
  created_at: "2026-04-02T12:00:00Z"          # ISO 8601
  completed_at: "2026-04-02T12:01:23Z"
  duration_ms: 83000
  anonymous: true | false                       # whether scan was run without auth
  user_id: null | "usr_..."                     # null if anonymous
 
input:
  dockerfile_present: true
  dockerfile_size_bytes: 1842
  lockfiles:                                    # empty array if none provided
    - filename: "package-lock.json"
      ecosystem: "npm"
      size_bytes: 245000
      dependency_count: 312
    - filename: "requirements.txt"
      ecosystem: "pip"
      size_bytes: 1200
      dependency_count: 45
 
base_images:
  - reference: "node:20-slim"
    registry: "docker.io"
    digest: "sha256:abc123..."
    os: "debian"
    os_version: "bookworm"
    last_updated: "2026-03-15T00:00:00Z"
    days_since_update: 18
 
summary:
  risk_level: "critical" | "high" | "medium" | "low" | "none"
  vulnerability_counts:
    critical: 3
    high: 7
    medium: 12
    low: 22
    negligible: 5
  total_vulnerabilities: 49
  fixable_vulnerabilities: 31                   # have a known fixed version
  exploit_known_count: 2                        # on CISA KEV list
  dockerfile_issues_count: 4
  package_count: 312
  ecosystem_breakdown:
    os: 89                                      # OS-level packages from base image
    npm: 312
    pip: 45
 
vulnerabilities:
  - id: "CVE-2026-1234"
    source: "nvd"                               # nvd | osv | ghsa | alpine | debian | ubuntu
    severity: "critical"
    cvss_score: 9.8
    cvss_vector: "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H"
    package: "openssl"
    ecosystem: "os"                             # os | npm | pip | go | cargo | gem | composer | maven
    installed_version: "3.0.2"
    fixed_version: "3.0.14"                     # null if no fix available
    fixable: true
    exploit_known: true                         # appears in CISA KEV
    title: "OpenSSL buffer overflow in X.509 certificate verification"
    description: "A buffer overrun can be triggered..."
    references:
      - "https://nvd.nist.gov/vuln/detail/CVE-2026-1234"
      - "https://www.openssl.org/news/secadv/20260301.txt"
    published_at: "2026-03-01T00:00:00Z"
    modified_at: "2026-03-15T00:00:00Z"
 
dockerfile_analysis:
  issues:
    - rule: "no-root-user"
      severity: "high"
      line: 15
      message: "Container runs as root. Add a USER directive."
      recommendation: "Add 'RUN addgroup -S app && adduser -S app -G app' and 'USER app'"
    - rule: "unpinned-base-image"
      severity: "medium"
      line: 1
      message: "Base image 'node:20-slim' uses a mutable tag."
      recommendation: "Pin to a specific digest: node:20-slim@sha256:abc123..."
    - rule: "missing-healthcheck"
      severity: "low"
      line: null
      message: "No HEALTHCHECK instruction found."
      recommendation: "Add HEALTHCHECK CMD curl -f http://localhost:3000/ || exit 1"
    - rule: "unnecessary-layer"
      severity: "info"
      line: 8
      message: "Multiple consecutive RUN instructions could be combined."
      recommendation: "Chain with && to reduce image layers."
  stages:                                       # multi-stage build info
    - name: "builder"
      base_image: "node:20-slim"
      line: 1
    - name: "runtime"
      base_image: "node:20-slim"
      line: 22
 
sbom:
  format: "cyclonedx"                           # cyclonedx | spdx
  component_count: 312
  license_summary:
    MIT: 180
    Apache-2.0: 65
    ISC: 30
    BSD-3-Clause: 20
    GPL-3.0-only: 2                             # flagged in report
    unknown: 15
  download_urls:
    cyclonedx: "/api/v1/trial/scans/{scanId}/sbom?format=cyclonedx"
    spdx: "/api/v1/trial/scans/{scanId}/sbom?format=spdx"
 
data_freshness:
  nvd_last_sync: "2026-04-02T10:00:00Z"
  osv_last_sync: "2026-04-02T11:00:00Z"
  ghsa_last_sync: "2026-04-02T11:00:00Z"
  alpine_last_sync: "2026-04-02T06:00:00Z"
  debian_last_sync: "2026-04-02T06:00:00Z"
  ubuntu_last_sync: "2026-04-02T06:00:00Z"
  cisa_kev_last_sync: "2026-04-02T06:00:00Z"
  stale_sources: []                             # list of sources >4hrs stale, shown as notice in report

Schema Notes

version: Schema version for forward compatibility. Consumers check this before parsing.
vulnerabilities: Sorted by severity (critical first), then by exploit_known (true first). Deduped by CVE ID; when multiple sources report the same CVE, we merge and prefer the most specific fix-version info.
ecosystem: Normalized ecosystem identifier. os = OS-level packages from the base image scan. Language ecosystems match the lockfile source.
dockerfile_analysis.issues[].rule: Machine-readable rule ID. Rules are defined in a separate configuration and can be extended.
sbom: The full SBOM is stored separately as a downloadable artifact. The YAML result contains only the summary and download URLs.
data_freshness: Included so the report can display a notice if vulnerability data is stale. Consumer should check stale_sources and surface a warning banner.

Implementation Phases

Phase 1: Anonymous Dockerfile Scan + Vuln DB (1–2 weeks)

Build the “Test Us” landing page with Dockerfile paste + lockfile upload — no sign-up required
Turnstile CAPTCHA for anonymous scans, IP-based rate limiting (3/day)
Implement anonymous scan Worker endpoint
Build vulnerability DB sync pipeline — scheduled Workers syncing NVD, OSV, GHSA, and distro trackers into D1
Build inline scan results rendering (vulnerability table, Dockerfile analysis, SBOM preview)
Build sign-up flow (two paths: email + GitHub OAuth) with marketing consent
Anonymous scan claim endpoint (retroactively saves results on sign-up)
PDF export for authenticated users (stakeholder-ready sales deck format)
Emit marketing events via internal abstraction layer (platform TBD as strategic decision)
Rate limiting: 3 anon scans/IP/day, 1 completed authed scan/user/day

Delivers: Prospects paste a Dockerfile (+ optional lockfiles) and immediately see vulnerability results with zero sign-up. Sign-up unlocks saved reports, PDF export, and drip campaign enrollment.

Phase 2: Zip Upload (1 week)

Add multipart upload endpoint
Implement quarantined archive extraction (path traversal protection, size limits)
Extend scanner to analyze lockfiles found in archive
Add lockfile vulnerability scanning
Clear error messaging for invalid uploads (vs. out-of-scope missing context)

Delivers: Prospects can upload a project zip and get a comprehensive scan including dependency analysis.

Phase 3: GitHub Repository Link (1–2 weeks)

Build GitHub App for temporary repo access
Implement OAuth flow with immediate token revocation
Shallow clone + scan pipeline
Repository size validation

Delivers: Prospects can link a GitHub repo and get scanned without manually packaging anything.

Phase 4: Git Push (1 week)

Deploy lightweight Git HTTP backend (smart protocol) on CF Worker + R2
Implement temporary remote URL generation (UUID-as-auth, 15-min TTL)
Connect push receipt to scan queue

Delivers: Prospects can git push their project to get scanned — appealing to CLI-first users.

Phase 5: Drip Campaign Polish + Conversion (1 week)

A/B test CTAs on report page and in drip emails
Add “Compare: one-off vs. continuous” section to report
Drip email: report expiry notification (“Your report expires in 2 days — upgrade to keep monitoring”)
Personalized drip emails using YAML scan data (e.g., “We found X critical CVEs in your base image — here’s how continuous monitoring catches new ones”)
Analytics: track funnel from sign-up → scan → report view → PDF download → paid conversion

Success Metrics

Metric	Target (3 months post-launch)
Trial scans per week	500+
Scan completion rate	> 90%
Median time to report	< 2 min (Dockerfile), < 5 min (zip/repo)
Trial → sign-up conversion	> 15%
Sign-up → CI integration	> 40%

Open Questions

Should we support pre-built OCI image scanning? Prospect provides docker.io/library/nginx:latest and we pull + scan it directly. This is the simplest input method but potentially expensive (pulling large images) and could be used for reconnaissance. Consider adding as Phase 0 if cost is manageable.
Report sharing — public or private by default? Current proposal: unguessable URL, no auth required. Alternative: require email to view report (captures lead but adds friction).
~~Fly.io vs. self-hosted Firecracker on Hetzner~~ → Resolved: Cloudflare Workers as primary, Hetzner VPS ($5/mo) as fallback for OCI image scanning only.
~~Should trial scan results feed into our aggregate vulnerability database?~~ → Resolved: Yes — YAML scan results retained for drip campaign personalization and aggregate stats. Consent captured at sign-up.

Resolved Questions

~~Report sharing — public or private?~~ → Unguessable URL, no auth to view — prospects share the PDF/link with stakeholders as a sales deck.
~~Git push authentication~~ → UUID in the URL as a short-lived shared key (15-min TTL, tied to authenticated user session).
~~Rate limiting~~ → 1 completed scan per user per 24 hours, retries allowed for broken uploads. Missing build context = out-of-scope flag, not a failure.

Remaining Open Questions

~~Marketing platform choice~~ → Deferred: handled as a separate strategic decision. This PRD integrates via an internal marketing event API abstraction.
PDF generation approach — @react-pdf/renderer in Workers (lightweight) vs. Puppeteer on fallback VPS (higher fidelity). Depends on report design requirements.
~~YAML schema for scan results~~ → Resolved: defined in this PRD (see “Scan Results YAML Schema” section above).
D1 row limits at scale — NVD alone has 200K+ CVEs. D1 free tier is 5GB which should suffice, but if scan throughput grows we may need to shard or move hot-path lookups to KV. Monitor after launch.

Techcle Wiki

Explorer

Test Us Flow

PRD: “Test Us” Flow — One-Off Prospect Scanning

Problem

Goals

Non-Goals

Input Methods

1. Dockerfile Paste (+ optional lockfiles) — NO SIGN-UP REQUIRED

2. Zip Upload — SIGN-UP REQUIRED

3. GitHub Repository Link — SIGN-UP REQUIRED

4. Git Push — SIGN-UP REQUIRED

Quarantined Processing Environment

Architecture

Isolation Requirements

Dockerfile Analysis Safety

Sign-Up & Lead Capture

When Sign-Up Is Triggered

Sign-Up Flow

Drip Campaign Integration

Scan Report

Report Sections

Report & Results Storage

API Design

New Endpoints (under /api/v1/trial)

Anonymous (no auth required, Turnstile token required)

Authenticated (session/JWT required)

Rate Limiting

Abuse Prevention

UX Flow

Landing Page → Scan (No Sign-Up for Dockerfile)

Authenticated Scan Flow (zip, GitHub, git push)

Progress States

Infrastructure

Compute: Cloudflare Workers (Primary)

Storage

Queue

Security Considerations

Threat Model

Data Handling

Vulnerability Data Sources & Update Strategy

Data Sources

How We Keep It Up to Date

Lockfile / Package Scanning

Scan Results YAML Schema

Schema Notes

Implementation Phases

Phase 1: Anonymous Dockerfile Scan + Vuln DB (1–2 weeks)

Phase 2: Zip Upload (1 week)

Phase 3: GitHub Repository Link (1–2 weeks)

Phase 4: Git Push (1 week)

Phase 5: Drip Campaign Polish + Conversion (1 week)

Success Metrics

Open Questions

Resolved Questions

Remaining Open Questions

Graph View

Table of Contents

New Endpoints (under `/api/v1/trial`)