Design Document: CascadeGuard CLI Improvements

Overview

CascadeGuard’s CLI currently requires every image in images.yaml to carry its own registry and repository fields, and lacks native CLI commands for state file generation and CI pipeline generation (both currently only accessible via Taskfile wrappers around standalone Python scripts). This feature introduces three interconnected improvements:

  1. Config inheritance via .cascadeguard.yaml — repo-level defaults for registry, repository, and local.dir that individual images inherit unless they override them. This eliminates repetition and fixes the 25 validation failures in cascadeguard-open-secure-images.

  2. New CLI commandscascadeguard images generate and cascadeguard ci generate that wrap the existing generate_state.py and generate_ci.py logic directly into the CLI, removing the need for Taskfile/Docker indirection.

  3. Validation fixcascadeguard images validate updated to resolve inherited defaults before checking required fields, so images that rely on repo-level defaults pass validation.

Architecture

graph TD
    subgraph ".cascadeguard.yaml"
        CG[Config File]
    end

    subgraph "images.yaml"
        IMG[Image Entries]
    end

    CG --> LOAD[load_config]
    IMG --> LOAD

    LOAD --> MERGE[merge_defaults]

    MERGE --> VAL[images validate]
    MERGE --> GEN[images generate]
    MERGE --> CI[ci generate]

    GEN --> STATE[state files<br/>base-images/ + images/]
    CI --> WF[.github/workflows/]

Config Resolution Flow

sequenceDiagram
    participant CLI as CLI Command
    participant CFG as ConfigLoader
    participant YAML as .cascadeguard.yaml
    participant IMG as images.yaml
    participant MERGE as merge_defaults()

    CLI->>CFG: load_config(repo_root)
    CFG->>YAML: read file
    YAML-->>CFG: {defaults: {registry, repository, local: {dir}}, ci: {platform}}
    CLI->>IMG: load images.yaml
    IMG-->>CLI: list of image dicts
    CLI->>MERGE: merge_defaults(images, config)
    MERGE-->>CLI: resolved images (each has registry, repository, etc.)
    CLI->>CLI: proceed with validate / generate / ci generate

Components and Interfaces

Component 1: ConfigLoader

Purpose: Loads and validates .cascadeguard.yaml, providing repo-level defaults.

Interface:

def load_config(repo_root: Path) -> dict:
    """Load .cascadeguard.yaml from repo_root. Returns {} if absent."""
    ...
 
def merge_defaults(images: list[dict], config: dict) -> list[dict]:
    """
    Return a new list of image dicts with repo-level defaults applied.
    Per-image fields take precedence over config defaults.
    Does NOT mutate the input list.
    """
    ...

Responsibilities:

  • Parse .cascadeguard.yaml and return a typed config dict
  • Apply defaults.registry, defaults.repository, defaults.local.dir to each image that lacks those fields
  • Per-image values always override repo-level defaults (shallow merge per top-level key)

Component 2: Updated Validator (cmd_validate)

Purpose: Validates images.yaml after merging config defaults.

Responsibilities:

  • Load config, load images, merge defaults, then validate
  • For enabled images: require name, registry, image (or repository), dockerfile
  • For disabled images (enabled: false): require only name
  • Report clear errors showing whether a missing field could be fixed by adding it to .cascadeguard.yaml

Component 3: CLI Commands (images generate, ci generate)

Purpose: Expose generate_state.py and generate_ci.py functionality as first-class CLI subcommands.

Responsibilities:

  • images generate: calls generate_state.generate_state_for_image() for each image, using the current working directory as output
  • ci generate: calls generate_ci.generate_ci() with resolved platform from config
  • Both commands load config and merge defaults before processing

Data Models

.cascadeguard.yaml Schema (Extended)

# Repo-level defaults applied to every image in images.yaml
defaults:
  registry: ghcr.io/cascadeguard    # default registry for all images
  repository: cascadeguard           # default repository prefix (optional)
  local:
    dir: images                      # default local folder containing Dockerfiles
 
# CI configuration (existing)
ci:
  platform: github                   # github | gitlab (future)
 
# Tagging configuration (existing, unchanged)
tagging:
  stateRepo: true
  sourceRepo: false
  sourceRepoSecret: CROSS_REPO_PAT
# Python type representation
from typing import TypedDict, Optional
 
class LocalDefaults(TypedDict, total=False):
    dir: str  # e.g. "images"
 
class ConfigDefaults(TypedDict, total=False):
    registry: str       # e.g. "ghcr.io/cascadeguard"
    repository: str     # e.g. "cascadeguard"
    local: LocalDefaults
 
class CIConfig(TypedDict, total=False):
    platform: str  # "github" | "gitlab"
 
class CascadeGuardConfig(TypedDict, total=False):
    defaults: ConfigDefaults
    ci: CIConfig
    tagging: dict

Validation Rules:

  • defaults section is entirely optional
  • Each field within defaults is optional
  • defaults.registry must be a non-empty string if present
  • defaults.local.dir must be a valid relative path if present
  • Unknown keys are silently ignored (forward compatibility)

Image Entry (after merge)

An image entry after merge_defaults() has been applied. The merge fills in missing fields from config defaults:

# Before merge (in images.yaml):
{"name": "nginx", "dockerfile": "images/nginx/Dockerfile", "image": "nginx", "tag": "stable-alpine-slim"}
 
# After merge (with defaults.registry = "ghcr.io/cascadeguard"):
{"name": "nginx", "dockerfile": "images/nginx/Dockerfile", "image": "nginx", "tag": "stable-alpine-slim",
 "registry": "ghcr.io/cascadeguard"}

Key Functions with Formal Specifications

Function 1: merge_defaults()

def merge_defaults(images: list[dict], config: dict) -> list[dict]:
    """Apply repo-level defaults from config to each image."""

Preconditions:

  • images is a list of dicts (may be empty)
  • config is a dict (may be empty or lack defaults key)

Postconditions:

  • Returns a new list of the same length as images
  • For each returned image r[i] and original images[i]:
    • If images[i] has a key, r[i] has the same value for that key
    • If images[i] lacks a key and config["defaults"] has it, r[i] gets the default
    • images[i] is not mutated
  • Only these keys are inherited: registry, repository, local.dir

Loop Invariants:

  • All previously processed images have defaults applied correctly
  • Original images list is never modified

Function 2: cmd_validate() (updated)

def cmd_validate(args) -> int:
    """Validate images.yaml with config inheritance."""

Preconditions:

  • args.images_yaml points to a readable file or a missing file (error case)
  • Working directory contains .cascadeguard.yaml (optional)

Postconditions:

  • Returns 0 if all images pass validation after merging defaults
  • Returns 1 if any validation errors exist, with errors printed to stderr
  • Disabled images (enabled: false) only require name
  • Enabled images require name, registry, and dockerfile

Function 3: cmd_images_generate()

def cmd_images_generate(args) -> int:
    """Generate state files from images.yaml."""

Preconditions:

  • args.images_yaml points to a readable images.yaml
  • args.output_dir is a writable directory path

Postconditions:

  • State files created/updated in {output_dir}/base-images/ and {output_dir}/images/
  • Returns 0 on success, 1 on failure
  • Idempotent: re-running produces the same result

Function 4: cmd_ci_generate()

def cmd_ci_generate(args) -> int:
    """Generate CI pipeline files from images.yaml."""

Preconditions:

  • args.images_yaml points to a readable images.yaml
  • args.output_dir is a writable directory path

Postconditions:

  • GitHub Actions workflow files created in {output_dir}/.github/workflows/
  • Platform resolved from: CLI flag > .cascadeguard.yaml > default (“github”)
  • Returns 0 on success, 1 on failure

Algorithmic Pseudocode

Config Loading and Merging Algorithm

def load_config(repo_root: Path) -> dict:
    config_path = repo_root / ".cascadeguard.yaml"
    if not config_path.exists():
        return {}
    with open(config_path) as f:
        return yaml.safe_load(f) or {}
 
 
def merge_defaults(images: list[dict], config: dict) -> list[dict]:
    """
    Apply repo-level defaults to each image.
    
    Merge strategy: shallow per-key. Image-level values always win.
    Only specific keys are inherited from defaults.
    """
    defaults = config.get("defaults", {})
    if not defaults:
        return [dict(img) for img in images]  # shallow copy, no defaults to apply
 
    default_registry = defaults.get("registry")
    default_repository = defaults.get("repository")
    default_local = defaults.get("local", {})
    default_local_dir = default_local.get("dir")
 
    result = []
    for img in images:
        merged = dict(img)  # shallow copy
 
        # Apply registry default
        if "registry" not in merged and default_registry:
            merged["registry"] = default_registry
 
        # Apply repository default
        if "repository" not in merged and default_repository:
            merged["repository"] = default_repository
 
        # Apply local.dir default (nested merge)
        if default_local_dir:
            img_local = merged.get("local", {})
            if "dir" not in img_local:
                merged_local = dict(img_local)
                merged_local["dir"] = default_local_dir
                merged["local"] = merged_local
 
        result.append(merged)
 
    return result

Updated Validation Algorithm

def cmd_validate(args) -> int:
    images_yaml = Path(args.images_yaml)
    if not images_yaml.exists():
        print(f"Error: images.yaml not found: {images_yaml}", file=sys.stderr)
        return 1
 
    with open(images_yaml) as f:
        images = yaml.safe_load(f) or []
 
    if not isinstance(images, list):
        print("Error: images.yaml must be a list", file=sys.stderr)
        return 1
 
    # Load config and merge defaults BEFORE validation
    repo_root = images_yaml.parent
    config = load_config(repo_root)
    resolved_images = merge_defaults(images, config)
 
    errors = []
    for i, image in enumerate(resolved_images):
        name = image.get("name")
        if not name:
            errors.append(f"Image {i}: missing 'name' field")
            continue
 
        # Disabled images only need a name
        if not image.get("enabled", True):
            continue
 
        # Enabled images need registry and dockerfile
        if not image.get("registry"):
            errors.append(f"Image '{name}': missing 'registry' (set in image or .cascadeguard.yaml defaults)")
        if not image.get("dockerfile"):
            errors.append(f"Image '{name}': missing 'dockerfile' field")
 
    if errors:
        print("Validation errors:", file=sys.stderr)
        for err in errors:
            print(f"  - {err}", file=sys.stderr)
        return 1
 
    print(f"Validated {len(resolved_images)} images in {images_yaml}")
    return 0

CLI Command Registration

# In build_parser(), add to images subcommands:
 
# images generate
images_generate = images_sub.add_parser(
    "generate", help="Generate state files from images.yaml"
)
images_generate.add_argument(
    "--output-dir", default=".",
    help="Output directory (default: current directory)"
)
images_generate.add_argument(
    "--cache-dir", default=None,
    help="Cache directory for cloned repos"
)
 
# In build_parser(), add new top-level 'ci' command:
 
ci = sub.add_parser("ci", help="CI/CD pipeline generation")
ci_sub = ci.add_subparsers(dest="ci_command", metavar="subcommand")
ci_sub.required = True
 
ci_generate = ci_sub.add_parser(
    "generate", help="Generate CI pipeline files from images.yaml"
)
ci_generate.add_argument(
    "--images-yaml", default="images.yaml",
    help="Path to images.yaml (default: images.yaml)"
)
ci_generate.add_argument(
    "--output-dir", default=".",
    help="Output directory (default: current directory)"
)
ci_generate.add_argument(
    "--platform", default=None,
    help="CI platform (github). Overrides .cascadeguard.yaml"
)
ci_generate.add_argument(
    "--dry-run", action="store_true",
    help="Preview without writing files"
)

Command Handler Implementations

def cmd_images_generate(args) -> int:
    """Generate state files from images.yaml."""
    from generate_state import (
        load_images_yaml, load_config, generate_state_for_image,
        _generate_build_workflow
    )
 
    images_yaml = Path(args.images_yaml)
    output_dir = Path(args.output_dir)
 
    if not images_yaml.exists():
        print(f"Error: images.yaml not found: {images_yaml}", file=sys.stderr)
        return 1
 
    cache_dir = Path(args.cache_dir) if args.cache_dir else output_dir / ".cascadeguard-cache"
    cache_dir.mkdir(parents=True, exist_ok=True)
 
    images = load_images_yaml(images_yaml)
    config = load_config(output_dir)
 
    print(f"Found {len(images)} images in {images_yaml}")
 
    success = 0
    workflows = 0
    for image in images:
        if generate_state_for_image(image, output_dir, cache_dir):
            success += 1
        if _generate_build_workflow(image, output_dir, config):
            workflows += 1
 
    print(f"\nGenerated state for {success}/{len(images)} images, {workflows} workflows")
    return 0
 
 
def cmd_ci_generate(args) -> int:
    """Generate CI pipeline files from images.yaml."""
    from generate_ci import generate_ci
 
    images_yaml = Path(args.images_yaml)
    output_dir = Path(args.output_dir)
 
    if not images_yaml.exists():
        print(f"Error: images.yaml not found: {images_yaml}", file=sys.stderr)
        return 1
 
    generate_ci(
        images_yaml_path=images_yaml,
        output_dir=output_dir,
        dry_run=args.dry_run,
        platform=args.platform,
    )
    return 0

Example Usage

# 1. .cascadeguard.yaml with defaults
# ─────────────────────────────────────
# defaults:
#   registry: ghcr.io/cascadeguard
#   local:
#     dir: images
# ci:
#   platform: github
 
# 2. images.yaml (no registry needed per-image)
# ──────────────────────────────────────────────
# - name: nginx
#   dockerfile: images/nginx/Dockerfile
#   image: nginx
#   tag: stable-alpine-slim
#
# - name: memcached
#   enabled: false
#   namespace: library
 
# 3. CLI usage
# ────────────
# Validate (now passes with config inheritance):
#   cascadeguard images validate --images-yaml images.yaml
#
# Generate state files:
#   cascadeguard images generate --images-yaml images.yaml --output-dir .
#
# Generate CI pipelines:
#   cascadeguard ci generate --images-yaml images.yaml --output-dir .
#
# Generate CI with explicit platform:
#   cascadeguard ci generate --platform github --dry-run

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Default inheritance

For any image entry and for any inheritable key (registry, repository, local.dir), if the image lacks that key and the config defaults provide it, then the merged image has the config default value for that key.

Validates: Requirements 2.1, 2.2, 2.3

Property 2: Override precedence

For any image entry and for any inheritable key (registry, repository, local.dir), if the image already has that key set, then the merged image retains the image’s original value regardless of the config default.

Validates: Requirements 2.4, 2.5, 2.6

Property 3: Non-mutation

For any list of image entries and for any config, calling merge_defaults does not modify the original image list or any of its contained dictionaries.

Validates: Requirement 2.7

Property 4: Length preservation

For any list of image entries and for any config, merge_defaults returns a list of the same length as the input.

Validates: Requirement 2.8

Property 5: No-defaults backward compatibility

For any list of image entries and a config with no defaults section (or an empty one), merge_defaults returns copies equivalent to the originals with no fields added or removed.

Validates: Requirements 2.9, 7.1

Property 6: Disabled image leniency

For any image entry with enabled: false that has a name field, validation passes regardless of which other fields are missing.

Validates: Requirement 3.4

Property 7: Missing name always fails validation

For any image entry (enabled or disabled) that lacks a name field, validation reports an error.

Validates: Requirement 3.5

Property 8: Validation correctness for enabled images

For any enabled image entry, validation passes if and only if the image has name, registry, and dockerfile fields present after merging defaults.

Validates: Requirements 3.2, 3.3, 3.6, 3.7

Property 9: Generation idempotency

For any valid images.yaml and config, running images generate or ci generate twice with the same inputs produces identical output files.

Validates: Requirements 4.6, 5.9

Error Handling

Error Scenario 1: Missing .cascadeguard.yaml

Condition: File does not exist in repo root Response: load_config() returns {}, no defaults applied Recovery: Validation proceeds with per-image fields only (backward compatible)

Error Scenario 2: Malformed .cascadeguard.yaml

Condition: YAML parse error or non-dict root Response: Print error to stderr, return exit code 1 Recovery: User fixes YAML syntax

Error Scenario 3: Missing required fields after merge

Condition: An enabled image still lacks registry or dockerfile after defaults are applied Response: Validation error message hints that the field can be set in .cascadeguard.yaml defaults Recovery: User adds the field to either the image entry or config defaults

Error Scenario 4: images generate with unreachable source repo

Condition: Git clone fails for a source repo during state generation Response: Warning printed, existing state preserved if available, generation continues for other images Recovery: Existing behavior from generate_state.py — graceful degradation

Testing Strategy

Unit Testing Approach

  • Test merge_defaults() with: empty config, partial defaults, full defaults, per-image overrides, disabled images
  • Test updated cmd_validate() with: valid images + config, missing fields, disabled images, no config file
  • Test CLI argument parsing for new images generate and ci generate subcommands
  • Mock generate_state and generate_ci module calls in command handler tests

Property-Based Testing Approach

Property Test Library: hypothesis

  • Property: merge_defaults never removes keys that exist on the original image
  • Property: merge_defaults output length equals input length
  • Property: if config has no defaults, merge_defaults returns copies identical to originals
  • Property: disabled images always pass validation regardless of missing fields

Integration Testing Approach

  • End-to-end test: create temp directory with .cascadeguard.yaml + images.yaml, run cascadeguard images validate, assert exit code 0
  • End-to-end test: run cascadeguard images generate, verify state files created
  • End-to-end test: run cascadeguard ci generate, verify workflow files created
  • Regression test: validate cascadeguard-open-secure-images repo with the new config defaults

Security Considerations

  • .cascadeguard.yaml is a local config file read from the repo root — no remote fetching
  • Registry URLs in defaults are used as-is; no URL validation beyond non-empty string check (same as current behavior)
  • generate_state.py already handles GitHub token securely via environment variables; no changes needed

Dependencies

  • Existing: pyyaml, argparse, pathlib (all already in use)
  • Existing modules: generate_state.py, generate_ci.py (imported by new CLI commands)
  • No new external dependencies required