Design Document: CascadeGuard CLI v2

Overview

Align the CascadeGuard CLI (v0.1 pre-release) to match the getting-started docs. Five changes: add cg alias, rename cibuild command group, add cg init scaffolding command, unify images check to absorb check-upstream and generate into a single pass, and remove the now-redundant check-upstream-tags.yaml workflow from open-secure-images.

Main Algorithm/Workflow

sequenceDiagram
    participant U as User
    participant CLI as cg CLI
    participant Seed as cascadeguard-seed repo
    participant FS as Filesystem
    participant Reg as Container Registry

    Note over U,CLI: cg init
    U->>CLI: cg init
    CLI->>Seed: clone/copy seed repo
    Seed-->>CLI: seed files
    CLI->>FS: scaffold files (skip existing)
    FS-->>U: .cascadeguard.yaml, images.yaml, workflows, etc.

    Note over U,CLI: cg images check (unified)
    U->>CLI: cg images check
    CLI->>FS: load images.yaml + .cascadeguard.yaml defaults
    loop each enabled image
        CLI->>FS: parse Dockerfile (local or remote clone)
        CLI->>FS: write/update .cascadeguard/images/{name}.yaml
        CLI->>FS: write/update .cascadeguard/base-images/{ref}.yaml
    end
    loop each discovered base image
        CLI->>Reg: HEAD manifest (digest check)
        Reg-->>CLI: Docker-Content-Digest
        CLI->>Reg: GET tags (upstream tag check)
        Reg-->>CLI: tag list
    end
    CLI-->>U: results (table or JSON)

    Note over U,CLI: cg build generate (renamed from ci)
    U->>CLI: cg build generate
    CLI->>FS: read images.yaml
    CLI->>FS: emit .github/workflows/*.yaml

Core Interfaces/Types

# --- pyproject.toml entry points ---
# [project.scripts]
# cascadeguard = "app:main"
# cg = "app:main"              # NEW alias
 
# --- init command types ---
@dataclass
class InitOptions:
    seed_repo: str = "https://github.com/cascadeguard/cascadeguard-seed.git"
    target_dir: Path = Path(".")
    branch: str = "main"
 
SEED_FILES: list[str] = [
    ".cascadeguard.yaml",
    "images.yaml",
    ".github/workflows/check.yaml",
    ".github/workflows/ci.yaml",
    ".github/workflows/build-image.yaml",
    ".cascadeguard/actions-policy.yaml",
    ".gitignore",                          # append .cascadeguard/.cache/
    "images/",                             # example image directory
]
 
# --- unified check result types ---
@dataclass
class BaseImageCheckResult:
    name: str
    status: str          # "ok" | "drift" | "new" | "error" | "skipped"
    recorded_digest: str | None = None
    live_digest: str | None = None
    new_upstream_tags: list[str] | None = None
    reason: str | None = None
 
@dataclass
class CheckResults:
    image_results: list[BaseImageCheckResult]
    has_drift: bool
    has_new_tags: bool

Key Functions with Formal Specifications

Function 1: cmd_init(args) -> int

def cmd_init(args) -> int:
    """Scaffold current directory from cascadeguard-seed."""

Preconditions:

  • args.target_dir is a valid writable directory (defaults to .)
  • Network access available to clone seed repo (or local seed path exists)

Postconditions:

  • All seed files exist in target directory
  • No pre-existing files were overwritten (skip with warning)
  • .gitignore has .cascadeguard/.cache/ entry (appended if file exists, created if not)
  • Returns 0 on success, 1 on fatal error

Loop Invariants:

  • For each seed file: if target / file exists, skip; otherwise copy from seed

Function 2: cmd_check(args) -> int (unified)

def cmd_check(args) -> int:
    """Unified check: generate state, discover bases, check drift, check upstream tags."""

Preconditions:

  • images.yaml exists and is a valid YAML list
  • .cascadeguard.yaml may or may not exist (defaults apply)
  • args.state_dir defaults to .cascadeguard

Postconditions:

  • .cascadeguard/images/{name}.yaml written/updated for each enabled image
  • .cascadeguard/base-images/{ref}.yaml written/updated for each discovered base
  • Registry queried for digest drift on all base images
  • Docker Hub queried for new upstream tags on all enrolled images
  • Returns 1 if drift detected OR new upstream tags found, 0 otherwise
  • Output format controlled by --format (table | json)

Loop Invariants:

  • After processing image i: all state files for images 0..i are up to date
  • all_base_image_refs accumulates all unique base image references seen so far

Function 3: cmd_build_generate(args) -> int (renamed from cmd_ci_generate)

def cmd_build_generate(args) -> int:
    """Generate CI pipeline files from images.yaml. Renamed from ci generate."""

Preconditions:

  • images.yaml exists at args.images_yaml
  • Output directory is writable

Postconditions:

  • GitHub Actions workflow files written to {output_dir}/.github/workflows/
  • Identical behavior to current cmd_ci_generate
  • Returns 0

Loop Invariants: N/A

Algorithmic Pseudocode

cg init Algorithm

def cmd_init(args) -> int:
    target = Path(args.target_dir).resolve()
    seed_dir = clone_seed_repo(SEED_REPO_URL, branch="main", cache=tempdir())
 
    skipped, copied = 0, 0
    for rel_path in walk_seed_files(seed_dir):
        dest = target / rel_path
        if dest.exists():
            print(f"  skip (exists): {rel_path}", file=sys.stderr)
            skipped += 1
            continue
        dest.parent.mkdir(parents=True, exist_ok=True)
        shutil.copy2(seed_dir / rel_path, dest)
        copied += 1
 
    # Ensure .gitignore has cache entry
    gitignore = target / ".gitignore"
    cache_entry = ".cascadeguard/.cache/"
    if gitignore.exists():
        content = gitignore.read_text()
        if cache_entry not in content:
            with open(gitignore, "a") as f:
                f.write(f"\n{cache_entry}\n")
    # (if .gitignore was copied from seed, it should already have it)
 
    print(f"Initialised: {copied} files copied, {skipped} skipped (already exist)")
    return 0

Unified images check Algorithm

def cmd_check(args) -> int:
    images_yaml = Path(args.images_yaml)
    images = yaml.safe_load(images_yaml.read_text()) or []
    config = load_config(images_yaml.parent)
    resolved = merge_defaults(images, config)
 
    state_dir = Path(args.state_dir)
    images_dir = state_dir / "images"
    base_images_dir = state_dir / "base-images"
    cache_dir = state_dir / ".cache"
    for d in (images_dir, base_images_dir, cache_dir):
        d.mkdir(parents=True, exist_ok=True)
 
    image_filter = getattr(args, "image", None)
    fmt = getattr(args, "format", "table")
 
    # Phase 1: Discover base images from Dockerfiles, write image state
    all_base_refs: dict[str, str] = {}  # norm_name -> full_ref
    for image in resolved:
        name = image.get("name")
        if not name or not image.get("enabled", True):
            continue
        if image_filter and name != image_filter:
            continue
 
        base_images = discover_base_images(image, images_yaml.parent, cache_dir)
        for norm, ref in base_images:
            all_base_refs[norm] = ref
 
        write_image_state(images_dir / f"{name}.yaml", name, image, base_images)
 
    # Phase 2: Write base image state files
    for norm_name, full_ref in all_base_refs.items():
        write_or_update_base_image_state(base_images_dir / f"{norm_name}.yaml", full_ref)
 
    # Phase 3: Check registries for digest drift (existing logic)
    results = check_digest_drift(base_images_dir, image_filter)
 
    # Phase 4: Check upstream tags (absorbed from check-upstream)
    upstream_findings = check_upstream_tags(resolved, image_filter)
    for finding in upstream_findings:
        results.append({
            "image": finding["image"],
            "status": "new-tags",
            "new_tags": finding["new_tags"],
        })
 
    has_drift = any(r["status"] == "drift" for r in results)
    has_new_tags = any(r["status"] == "new-tags" for r in results)
 
    output_results(results, fmt)
    return 1 if (has_drift or has_new_tags) else 0

build command group (rename from ci)

# In build_parser():
# Replace:
#   ci = sub.add_parser("ci", ...)
#   ci_sub = ci.add_subparsers(dest="ci_command", ...)
# With:
#   build = sub.add_parser("build", ...)
#   build_sub = build.add_subparsers(dest="build_command", ...)
 
# In main() dispatch:
# Replace:  "ci": cmd_ci
# With:     "build": cmd_build
 
def cmd_build(args) -> int:
    """Dispatch 'build' subcommands."""
    return {"generate": cmd_build_generate}[args.build_command](args)

Example Usage

# Install — both entry points work identically
cascadeguard images validate
cg images validate
 
# Scaffold a new state repo
mkdir my-images && cd my-images && git init
cg init
# → copies seed files, skips anything that already exists
 
# Edit images.yaml, then run unified check
cg images check
# → discovers base images from Dockerfiles
# → writes .cascadeguard/images/*.yaml and .cascadeguard/base-images/*.yaml
# → queries registries for digest drift
# → queries Docker Hub for new upstream tags
# → prints table of results
 
cg images check --format json
# → same but JSON output (for CI consumption)
 
# Generate CI pipelines (renamed from "ci generate")
cg build generate
cg build generate --dry-run
 
# Other commands unchanged
cg images validate
cg images enrol --name my-app --registry ghcr.io --repository org/my-app
cg images status
cg pipeline run
cg vuln report --image alpine --dir reports/
cg actions pin
cg scan

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: cg and cascadeguard alias equivalence

For any valid CLI subcommand and argument combination, invoking via cg and invoking via cascadeguard shall produce identical output and identical exit codes.

Validates: Requirement 1.2

Property 2: Init copies new files and skips existing files

For any set of seed files and for any subset of those files that already exist in the target directory, after running cg init: every seed file that did not previously exist is now present in the target directory with content matching the seed, and every file that previously existed retains its original content unchanged.

Validates: Requirements 2.1, 2.2

Property 3: Init .gitignore idempotence

For any target directory state (no .gitignore, .gitignore without cache entry, .gitignore with cache entry), after running cg init, the .gitignore file contains exactly one .cascadeguard/.cache/ entry and any pre-existing content is preserved.

Validates: Requirements 2.3, 2.4, 2.5

Property 4: Init summary counts match file disposition

For any set of seed files and for any subset of pre-existing files in the target directory, the summary printed by cg init reports copied = total_seed_files - pre_existing_count and skipped = pre_existing_count.

Validates: Requirement 2.6

Property 5: build generate produces identical output to former ci generate

For any valid images.yaml configuration, running cg build generate produces the same set of workflow files with the same content as the former cg ci generate command.

Validates: Requirement 3.2

Property 6: Dry-run writes no files

For any valid images.yaml configuration, running cg build generate --dry-run writes zero files to the output directory.

Validates: Requirement 3.3

Property 7: Dockerfile base image extraction

For any valid Dockerfile containing one or more FROM statements (excluding scratch and build-stage aliases), the Check_Command parser extracts all base image references in order.

Validates: Requirement 4.2

Property 8: State file creation matches discovered images

For any images.yaml listing enabled images with Dockerfiles, after running cg images check, there exists a state file under .cascadeguard/images/{name}.yaml for each enabled image and a state file under .cascadeguard/base-images/{ref}.yaml for each unique base image reference discovered.

Validates: Requirement 4.3

Property 9: Drift detection correctness

For any base image where the recorded digest differs from the live registry digest, the Check_Command reports that image with status drift. For any base image where the digests match, the Check_Command does not report drift for that image.

Validates: Requirement 4.5

Property 10: Exit code reflects drift and upstream tag status

For any set of check results, the Check_Command returns exit code 1 if and only if at least one result has status drift or new-tags. Otherwise it returns exit code 0.

Validates: Requirements 4.7, 4.8

Property 11: Output format completeness

For any non-empty set of check results, both the table and json output formats include the image name, status, and relevant details for every result entry. The json format additionally produces output that is valid JSON parseable by a standard JSON parser.

Validates: Requirements 7.1, 7.2

Property 12: Image filter scoping

For any images.yaml containing multiple images, when --image <name> is provided, the Check_Command output contains results only for the named image and no others.

Validates: Requirement 4.11