Image Factory - Requirements

Problem Statement

When public base images are updated, our internal images that depend on them become stale and potentially vulnerable. We need an automated system that:

Monitors upstream base images for updates
Waits a configurable period (default: 7 days) to allow the community to discover vulnerabilities
Rebuilds our internal images that depend on the updated base image
Cascades rebuilds through the dependency chain
Operates in a federated model where image repositories are distributed across multiple repos/teams

Functional Requirements

FR1: Image Enrollment

Developers can enroll images by adding them to images.yaml
Two types of images:
- Managed Images: Have source repo and Dockerfile (we build them)
- External Images: No source repo (third-party images we monitor)
Enrollment specifies:
- Image registry and repository
- Source code location (optional - only for managed images)
- Dockerfile path (optional - only for managed images)
- Build workflow/pipeline (optional - only for managed images)
- Rebuild policies (delay, auto-rebuild)
- Warehouse configuration (repoURL, allowTags) for external images

FR2: Dependency Discovery

System automatically discovers base images from Dockerfiles (managed images only)
Parses all FROM statements
Handles multi-stage builds
Tracks dependency relationships
Base images are initially treated as external
Base images can be promoted to managed by adding source info to images.yaml

FR3: Base Image Monitoring

Monitors upstream base images for digest changes
Detects new versions in registries
Tracks update history
No polling - event-driven via Kargo Warehouses

FR4: Rebuild Orchestration

Waits configurable delay period after base image update
Triggers rebuilds of dependent images
Cascades through dependency chain
Updates state with rebuild attempts and results

FR5: State Management

Maintains state files for all images and base images
Tracks current versions, digests, and update history
Preserves runtime data across updates
Configuration (images.yaml) takes precedence over state files
State files contain warehouse configuration (repoURL, allowTags) for CDK8s
Analysis tool output must align with CDK8s input requirements

FR6: Manifest Generation

CDK8s app reads both images.yaml and state files
Generates Kargo Warehouse resources for:
- Base images (discovered from Dockerfiles)
- External images (enrolled without source)
- Managed images that become external (source removed)
Does NOT generate warehouses for managed images (they’re built, not monitored)
Merges images.yaml with state (images.yaml takes precedence)
Output must be valid Kargo Warehouse YAML

FR7: GitOps Integration

All configuration in git
State changes committed to git
Kargo resources generated from config and state
ArgoCD applies resources automatically

Non-Functional Requirements

NFR1: Event-Driven Architecture

No polling or CronJobs
React to Kargo Freight creation
Efficient resource usage

NFR2: Pure Kargo Implementation

All monitoring through Kargo Warehouses
All analysis through Kargo AnalysisTemplates
All orchestration through Kargo Stages
Unified UI and monitoring

NFR3: Security

Credentials stored in Kubernetes Secrets
Minimal permission scopes
Audit trail in git
Support for image signature verification

NFR4: Scalability

Add images by editing configuration
No infrastructure to manage
Distributed across repos/teams

NFR5: Testability

Unit tests for analysis tool (apps/image-factory/test_app.py)
Unit tests for CDK8s manifest generation (cdk8s/image-factory/test_main.py)
Integration tests verifying tool → state → CDK8s workflow (image-factory/test_integration.py)
Tests verify data alignment between tool output and CDK8s input

Open Questions

How do we handle breaking changes in base images?
- Should we test images before promoting?
- Need a rollback mechanism?
What if a base image has a critical CVE?
- Should we rebuild immediately (skip waiting period)?
- How do we get notified of CVEs?
How do we handle multi-stage builds with multiple base images?
- Track all FROM statements?
- Prioritize by stage?
How do we handle rate limiting?
- Docker Hub has strict rate limits
- Need caching strategy?
- Should we use a pull-through cache?
How do we handle image lifecycle transitions?
- External → Managed: Add source info to images.yaml
- Managed → External: Remove source info from images.yaml
- Base image promoted to managed: Add to images.yaml with source
- How do we clean up old state files?

Techcle Wiki

Explorer

REQUIREMENTS

Image Factory - Requirements

Problem Statement

Functional Requirements

FR1: Image Enrollment

FR2: Dependency Discovery

FR3: Base Image Monitoring

FR4: Rebuild Orchestration

FR5: State Management

FR6: Manifest Generation

FR7: GitOps Integration

Non-Functional Requirements

NFR1: Event-Driven Architecture

NFR2: Pure Kargo Implementation

NFR3: Security

NFR4: Scalability

NFR5: Testability

Open Questions

Graph View

Table of Contents