Decision Record: Ad-Hoc Load Testing Framework

Date: 2026-02-04
Status: Proposed
Category: Infrastructure
Decision Makers: Platform Engineering Team

Context

We run an integration platform that federates APIs, allowing producers to surface their APIs for self-service consumption. The platform operates large sandbox instances, and we need a flexible, easy-to-configure system for running ad-hoc load tests.

Current State

  • Infrastructure: ArgoCD and GitLab available
  • Test Generation: Can run in GitLab CI or sandbox environment
  • Target System Under Test (SUT): Different sandbox environment from test generation
  • Scale: Need to test multiple federated APIs with varying load profiles
  • Use Cases:
    • Ad-hoc performance validation
    • Pre-production load testing
    • API capacity planning
    • Performance regression detection

Requirements

  1. Flexibility: Easy to configure different test scenarios and targets
  2. Self-Service: Teams should be able to trigger tests with minimal friction
  3. Isolation: Test generation and SUT should be separate environments
  4. Observability: Clear metrics and reporting
  5. Reproducibility: Tests should be version-controlled and repeatable
  6. Resource Efficiency: Don’t consume unnecessary sandbox resources

Decision

We will implement a k6-based load testing framework with the following architecture:

Tool Selection: k6 over Gatling

Chosen: k6
Alternatives Considered: Gatling, Locust, JMeter

Rationale:

  • Kubernetes-native: k6 operator enables distributed testing in K8s clusters
  • Lightweight: Smaller container footprint suitable for sandbox constraints
  • Developer-friendly: JavaScript/TypeScript tests are easier to write and maintain
  • GitLab Integration: Excellent CI/CD support with native performance reporting
  • Flexible Execution: CLI, K8s operator, or cloud-based execution modes
  • Modern Metrics: Built-in Prometheus/InfluxDB support

Trade-offs:

  • Gatling has better GUI for test recording (not critical for API testing)
  • k6 JavaScript runtime has learning curve for Java/JVM teams (acceptable given broader JS adoption)

Operator vs Non-Operator Deployment Comparison

A critical decision in implementing load testing is whether to use a Kubernetes operator, K8s Jobs, or simpler container-based execution. This affects architecture, scalability, operational complexity, and time-to-value.

Our Approach: We plan to use K8s Jobs triggered from GitLab CI, running on a separate cluster from the SUT. This offloads work from GitLab servers and avoids potential network bottlenecks between GitLab and the SUT.

Priority Concerns: Decision Matrix

These dimensions are critical to our decision-making process.

Priority Dimensionk6 + Operatork6 + K8s Jobk6 + GitLab RunnerGatling + OperatorGatling + K8s JobGatling + GitLab Runner
Quality of Reporting (Out-of-Box)⭐⭐⭐ JSON/text summary (needs Grafana for visual)⭐⭐⭐ JSON/text summary (needs Grafana for visual)⭐⭐⭐ JSON/text summary (needs Grafana for visual)⭐⭐⭐⭐⭐ Rich HTML reports built-in, detailed drill-downs, charts⭐⭐⭐⭐⭐ Rich HTML reports built-in, detailed drill-downs⭐⭐⭐⭐⭐ Rich HTML reports built-in
Quality with Tooling⭐⭐⭐⭐⭐ Excellent with Grafana/InfluxDB⭐⭐⭐⭐⭐ Excellent with Grafana/InfluxDB⭐⭐⭐⭐ Good with Grafana/InfluxDB⭐⭐⭐⭐⭐ Built-in + optional Grafana⭐⭐⭐⭐⭐ Built-in + optional Grafana⭐⭐⭐⭐⭐ Built-in + optional Grafana
Ease of Reporting⭐⭐⭐⭐ Automated via CRD, requires Grafana setup⭐⭐⭐⭐ Simple artifact collection, requires Grafana/report gen⭐⭐⭐⭐ GitLab artifacts, requires Grafana/report gen⭐⭐⭐⭐ Custom collection, HTML ready⭐⭐⭐⭐⭐ HTML reports work immediately⭐⭐⭐⭐⭐ HTML reports work immediately
Time to First Test1-2 days2-4 hours1-2 hours2-3 days3-5 hours1-2 hours
Time to MVP1-2 weeks1-3 days1-2 days2-3 weeks3-5 days1-2 days
Maturity⭐⭐⭐⭐⭐ Official Grafana Labs operator, production-ready⭐⭐⭐⭐⭐ Standard K8s Job pattern, rock solid⭐⭐⭐⭐⭐ Standard Docker execution⭐⭐⭐ Community operators, less mature⭐⭐⭐⭐⭐ Standard K8s Job pattern⭐⭐⭐⭐⭐ Standard Docker execution
Ease of Use⭐⭐⭐ Requires CRD knowledge, K8s expertise⭐⭐⭐⭐ Standard K8s Job, familiar to teams⭐⭐⭐⭐⭐ Simple Docker run command⭐⭐ Custom CRDs or complex Helm charts⭐⭐⭐⭐ Standard K8s Job⭐⭐⭐⭐⭐ Simple Docker run
Ease of Horizontal Scaling⭐⭐⭐⭐⭐ Built-in parallelism parameter⭐⭐⭐⭐ Job completions: N + manual coordination⭐⭐ Manual multi-runner orchestration⭐⭐⭐⭐ Operator-managed or manual⭐⭐⭐⭐ Job completions: N + coordination scripts⭐⭐ Manual orchestration

Rating Key: ⭐⭐⭐⭐⭐ Excellent | ⭐⭐⭐⭐ Good | ⭐⭐⭐ Acceptable | ⭐⭐ Limited | ⭐ Poor

Reporting Deep Dive: k6 vs Gatling

This is a critical differentiator that deserves detailed explanation.

Gatling Reporting (Out-of-the-Box Winner)

What You Get Immediately:

  • Rich HTML Reports: Beautiful, interactive reports generated automatically after each test
  • Visual Charts: Response time distribution, requests/second, response time percentiles over time
  • Drill-Down Capability: Click into specific requests, see detailed stats per endpoint
  • Statistical Analysis: Min/max/mean/percentiles, standard deviation
  • Error Analysis: Detailed breakdown of failures with counts and percentages
  • Self-Contained: Single HTML file (or folder) you can share, no server required

Example Gatling Report Sections:

1. Global Information: Total requests, OK/KO counts, min/max/mean/percentiles
2. Statistics Table: Per-request breakdown with all metrics
3. Active Users Over Time: Graph showing VU ramp-up/down
4. Response Time Distribution: Histogram of latencies
5. Response Time Percentiles: P50/P75/P95/P99 over time
6. Requests Per Second: Throughput over time
7. Responses Per Second: Success/failure rates

Artifact Collection:

# Gatling generates to target/gatling/<timestamp>/
# Contains: index.html + js/ + style/ folders
kubectl cp <pod>:/results/gatling ./gatling-report
# Open index.html in browser - fully functional report

Verdict: ⭐⭐⭐⭐⭐ Production-ready reports with zero additional tooling

Gatling → Grafana Integration Options:

Since you already have Prometheus and JMX monitoring infrastructure, Gatling has several options:

Option 1: Prometheus + JMX Exporter ⭐⭐⭐⭐ (Best for your setup)

  • How: Gatling exposes JMX metrics → JMX Exporter → Prometheus → Grafana
  • Setup:
    1. Run Gatling with JMX enabled: -Dgatling.jmx.enabled=true
    2. Deploy JMX Exporter as sidecar in K8s Job pod
    3. Configure Prometheus to scrape JMX Exporter endpoint
  • Pros:
    • ✅ Leverages your existing Prometheus infrastructure
    • ✅ Same pattern as other Java apps you monitor
    • ✅ Real-time metrics during test execution
  • Cons:
    • ⚠️ JMX Exporter sidecar adds complexity to Job manifest
    • ⚠️ Need to configure JMX metric mappings
    • ⚠️ Community Grafana dashboards (not official)
  • Setup Time: 2-3 hours (sidecar config + Prometheus ServiceMonitor + dashboard)

Option 2: Prometheus Pushgateway ⭐⭐⭐⭐

  • How: Gatling pushes metrics to Pushgateway → Prometheus scrapes → Grafana
  • Plugin: Use gatling-prometheus plugin
    // build.sbt
    libraryDependencies += "com.github.lkishalmi.gatling" % "gatling-prometheus" % "3.11.1"
    # gatling.conf
    data {
      writers = [console, file, prometheus]
    }
    prometheus {
      pushgateway {
        url = "http://pushgateway.monitoring:9091"
      }
    }
  • Pros:
    • ✅ Works well for batch jobs (like K8s Jobs)
    • ✅ Simpler than JMX Exporter (no sidecar)
    • ✅ Designed for short-lived processes
  • Cons:
    • ⚠️ Requires plugin installation (not built-in)
    • ⚠️ Pushgateway required (may already have it)
    • ⚠️ Metrics persist in Pushgateway after test (need cleanup)
  • Setup Time: 1-2 hours (plugin + pushgateway config)

Option 3: InfluxDB Export ⭐⭐⭐

  • How: Gatling → InfluxDB → Grafana
  • Plugin: gatling-influxdb
    libraryDependencies += "com.github.gatling" % "gatling-influxdb" % "1.1.4"
  • Pros:
    • ✅ Direct time-series storage
    • ✅ Good for historical trending
  • Cons:
    • ⚠️ Requires InfluxDB (if you don’t have it)
    • ⚠️ Separate from your Prometheus infrastructure
    • ⚠️ Plugin required
  • Setup Time: 2-4 hours (deploy InfluxDB if needed + plugin config)

Option 4: Graphite Export ⭐⭐

  • How: Built-in Gatling Graphite support → Grafana Graphite datasource
  • Configuration: Built into Gatling (no plugin)
    data {
      writers = [console, file, graphite]
    }
    graphite {
      host = "graphite.monitoring"
      port = 2003
    }
  • Pros:
    • ✅ No plugin required
    • ✅ Built-in support
  • Cons:
    • ⚠️ Requires Graphite (probably don’t have it)
    • ⚠️ Less common than Prometheus
  • Setup Time: 2-4 hours (deploy Graphite + configure)

Detailed Setup: Option 1 (JMX Exporter - Recommended for you)

K8s Job manifest with JMX Exporter sidecar:

apiVersion: batch/v1
kind: Job
metadata:
  name: gatling-test-${CI_PIPELINE_ID}
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9404"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      # Main Gatling container
      - name: gatling
        image: denvazh/gatling:latest
        command:
          - /opt/gatling/bin/gatling.sh
          - -sf=/simulations
          - -s=com.example.ApiSimulation
        env:
        - name: JAVA_OPTS
          value: "-Dgatling.jmx.enabled=true -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
        resources:
          requests:
            memory: 2Gi
            cpu: 1
          limits:
            memory: 4Gi
            cpu: 2
      
      # JMX Exporter sidecar
      - name: jmx-exporter
        image: bitnami/jmx-exporter:latest
        ports:
        - containerPort: 9404
          name: metrics
        volumeMounts:
        - name: jmx-config
          mountPath: /etc/jmx-exporter
        command:
        - java
        - -jar
        - /opt/bitnami/jmx-exporter/jmx_prometheus_httpserver.jar
        - "9404"
        - /etc/jmx-exporter/config.yaml
        resources:
          requests:
            memory: 128Mi
            cpu: 100m
      
      volumes:
      - name: jmx-config
        configMap:
          name: gatling-jmx-config

JMX Exporter config:

# ConfigMap: gatling-jmx-config
apiVersion: v1
kind: ConfigMap
metadata:
  name: gatling-jmx-config
data:
  config.yaml: |
    hostPort: localhost:1099
    rules:
    - pattern: "io.gatling.core<type=AllRequests><>(.+)"
      name: gatling_all_requests_$1
    - pattern: "io.gatling.core<type=Simulation><>(.+)"
      name: gatling_simulation_$1
    - pattern: "io.gatling.core<type=Request, name=(.+)><>(.+)"
      name: gatling_request_$2
      labels:
        request: "$1"

Prometheus ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: gatling-tests
  namespace: load-testing
spec:
  selector:
    matchLabels:
      job-type: load-test
  podMetricsEndpoints:
  - port: metrics
    interval: 10s

Setup Time Breakdown:

  • JMX Exporter sidecar config: 30 minutes
  • JMX metric mapping config: 1 hour
  • Prometheus PodMonitor: 15 minutes
  • Grafana dashboard: 1 hour
  • Total: ~2-3 hours

Detailed Setup: Option 2 (Pushgateway - Simpler)

Gatling with Prometheus plugin:

apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      containers:
      - name: gatling
        image: custom-gatling-with-prometheus-plugin:latest  # Custom image with plugin
        command:
          - /opt/gatling/bin/gatling.sh
          - -sf=/simulations
          - -s=com.example.ApiSimulation
        env:
        - name: PUSHGATEWAY_URL
          value: "http://pushgateway.monitoring:9091"
        resources:
          requests:
            memory: 2Gi
            cpu: 1

Gatling config (baked into custom image):

# gatling.conf
data {
  writers = [console, file, prometheus]
}
 
prometheus {
  pushgateway {
    url = ${?PUSHGATEWAY_URL}
    jobName = "gatling-load-test"
  }
}

Setup Time Breakdown:

  • Build custom image with plugin: 1 hour
  • Pushgateway deployment (if needed): 30 minutes
  • Prometheus scrape config: 15 minutes
  • Grafana dashboard: 1 hour
  • Total: ~2-3 hours (first time), 30 min (subsequent)

Comparison for Your Infrastructure:

MethodFits Existing SetupReal-timeSetup TimeComplexity
JMX Exporter⭐⭐⭐⭐⭐ Uses existing Prometheus + JMX pattern✅ Yes2-3 hours⭐⭐⭐ Moderate
Pushgateway⭐⭐⭐⭐⭐ Uses existing Prometheus✅ Yes1-2 hours⭐⭐⭐⭐ Simpler
InfluxDB⭐⭐ Requires separate stack✅ Yes2-4 hours⭐⭐⭐ Moderate
Graphite⭐ Requires Graphite✅ Yes2-4 hours⭐⭐⭐ Moderate

Recommendation for Your Setup:

  • Prometheus Pushgateway (if you have it) - simplest
  • JMX Exporter (if you don’t) - uses familiar pattern

Result: ⭐⭐⭐⭐ Gatling can integrate with your Prometheus/Grafana stack, but requires 1-3 hours additional setup vs k6’s native support

Comparison for Your Use Case (Existing Prometheus/Grafana):

Aspectk6Gatling
Prometheus Export⭐⭐⭐⭐⭐ Built-in experimental, or easy plugin⭐⭐⭐⭐ Via Pushgateway plugin or JMX Exporter
InfluxDB Export⭐⭐⭐⭐⭐ Built-in --out influxdb=...⭐⭐⭐ Requires plugin
Setup Effort (Prometheus)1 flag or small configPlugin + custom image OR JMX sidecar (2-3 hours)
Setup Effort (InfluxDB)1 line in command (if you have InfluxDB)Plugin + config file
Grafana Dashboards⭐⭐⭐⭐⭐ Official, well-maintained⭐⭐⭐ Community, requires customization
Dashboard AvailabilityMultiple official optionsLimited community options
Data SchemaStandardized, well-documentedLess standardized, varies by export method
Real-time Monitoring⭐⭐⭐⭐⭐ Seamless⭐⭐⭐⭐ Works with setup
JMX Pattern FitN/A (not JVM)⭐⭐⭐⭐⭐ Perfect fit (you already monitor JMX)

Since you already have Prometheus/Grafana:

  • ✅ k6’s advantage remains strong (simpler Prometheus export)
  • ✅ Official k6 Grafana dashboards work out-of-box
  • ✅ Gatling can integrate via JMX Exporter (familiar pattern for your Java apps)
  • ⚠️ Gatling requires 1-3 hours additional setup vs k6’s minutes
  • ⚠️ Gatling dashboards are community-maintained (less polished)

k6 Reporting (Trivial with Existing Grafana)

What You Get Immediately:

  • Text Summary: Console output with basic stats

    execution: local
    script: test.js
    output: -
    
    scenarios: (100.00%) 1 scenario, 100 max VUs, 5m30s max duration
    
    ✓ status is 200
    ✓ response time OK
    
    checks.........................: 100.00% ✓ 50000 ✗ 0
    data_received..................: 150 MB  500 kB/s
    data_sent......................: 5.0 MB  17 kB/s
    http_req_blocked...............: avg=1ms    min=0s   med=1ms  max=10ms  p(90)=2ms   p(95)=3ms
    http_req_duration..............: avg=100ms  min=50ms med=95ms max=500ms p(90)=150ms p(95)=200ms
    http_reqs......................: 50000   166.666667/s
    
  • JSON Output: Machine-readable metrics for parsing

    {
      "metrics": {
        "http_req_duration": {
          "type": "trend",
          "contains": "time",
          "values": {
            "min": 50.123,
            "max": 500.456,
            "avg": 100.789,
            "med": 95.234,
            "p(90)": 150.567,
            "p(95)": 200.890,
            "p(99)": 450.123
          }
        }
      }
    }

What You DON’T Get:

  • ❌ No visual charts/graphs
  • ❌ No drill-down HTML interface
  • ❌ No time-series graphs (response time over duration)
  • ❌ No distribution histograms

Options to Get Visual Reports:

Option 1: Grafana + InfluxDB (Best, Production-Grade)

  • Export metrics: k6 run --out influxdb=http://influxdb:8086/k6 test.js
  • Real-time dashboards during test execution
  • Historical trending across test runs
  • Requires: InfluxDB deployed, Grafana dashboards configured
  • Setup Time: 2-4 hours for first-time setup
  • Result: ⭐⭐⭐⭐⭐ Production-grade observability

Option 2: k6 HTML Report Generator (Third-Party)

  • Tool: k6-reporter (npm package) or k6-html-reporter
  • Generate HTML from JSON: k6-reporter summary.json
  • Creates basic HTML page with charts
  • Requires: Node.js, external package
  • Result: ⭐⭐⭐ Basic HTML, not as rich as Gatling

Option 3: k6 Cloud (Commercial)

  • Export to Grafana Cloud k6
  • Beautiful reports, no infrastructure
  • Requires: Subscription, data egress to cloud
  • Result: ⭐⭐⭐⭐⭐ Excellent but costs $$

GitLab Performance Widget:

# k6 can output to GitLab's performance format
artifacts:
  reports:
    performance: performance.json  # GitLab shows trend graph
  • Shows basic trend line in merge requests
  • Result: ⭐⭐⭐ Useful for CI/CD gates, not detailed analysis

Verdict:

  • Out-of-box: ⭐⭐⭐ Text/JSON only, requires tooling for visuals
  • With Grafana: ⭐⭐⭐⭐⭐ Excellent real-time + historical analysis
  • Trade-off: Setup overhead vs immediate gratification

Recommendation Based on Your Priorities

Since “Quality and Ease of Reporting” is your Priority #1, consider:

IMPORTANT: You Already Have Grafana 🎯

This significantly changes the evaluation in k6’s favor:

k6 with Existing Grafana (⭐⭐⭐⭐⭐ Recommended):

  • Trivial setup: Add single flag --out influxdb=http://influxdb:8086/k6
  • Official dashboards: Import Grafana k6 dashboard in 5 minutes
  • Real-time monitoring: Watch tests execute live in Grafana
  • Unified observability: Monitor both load tests AND SUT in same Grafana instance
  • Setup time: ~30 minutes (vs 2-4 hours if deploying Grafana from scratch)
  • Result: Best of both worlds - HTML for ad-hoc sharing, Grafana for analysis

Gatling with Existing Grafana (⭐⭐⭐ Possible but more work):

  • ⚠️ Requires gatling-influxdb plugin (not built-in)
  • ⚠️ Community dashboards (less polished than k6’s official ones)
  • ⚠️ Additional build configuration (Maven/sbt dependency)
  • ⚠️ Still get HTML reports, but Grafana integration is secondary
  • ⚠️ Setup time: ~2-3 hours (plugin + dashboard customization)

Our Recommendation (With Existing Grafana):

Phase 1 (Day 1): k6 + text/JSON

  • Get first test working in 2-4 hours
  • Text output sufficient to validate approach

Phase 1.5 (Day 2): Connect to Grafana (30 minutes)

  • Add --out influxdb=... to Job manifest
  • Import official k6 dashboard to Grafana
  • Now have real-time monitoring + historical trending

Optional: k6-reporter for HTML reports

  • Use for sharing results with stakeholders who don’t have Grafana access
  • 2 hours setup time

Result:

  • ✅ Gatling’s advantage (HTML reports) becomes less critical
  • ✅ k6’s Grafana integration is simpler and better supported
  • ✅ You get best of both: Grafana for analysis, optionally HTML for sharing
  • ✅ All observability in one place (load tests + SUT metrics in same Grafana)

Why k6 wins with existing Grafana:

  1. Setup: 1-line config vs plugin installation
  2. Dashboard quality: Official vs community
  3. Unified monitoring: Load test + SUT metrics side-by-side
  4. Lower resources: 512Mi vs 2Gi memory
  5. Faster setup: 2-4 hrs vs 3-5 hrs total
  6. More accessible: JS vs Scala

Gatling only makes sense if:

  • Team is already JVM/Scala-proficient
  • Need Gatling-specific features (recorder, complex DSL)
  • HTML reports are critical and Grafana access is restricted
  • Willing to invest in plugin setup + custom dashboards

Decision Table: With Existing Prometheus/Grafana 🎯
Factork6Gatling
Prometheus Integration⭐⭐⭐⭐⭐ Built-in experimental or xk6 plugin⭐⭐⭐⭐ Via Pushgateway plugin or JMX Exporter
InfluxDB Integration⭐⭐⭐⭐⭐ Built-in, 1-line flag⭐⭐⭐ Plugin required
Setup Time (Prometheus)30 min - 1 hour1-3 hours (JMX sidecar or custom image)
Setup Time (InfluxDB)30 minutes2-3 hours (plugin + config)
Dashboard Quality⭐⭐⭐⭐⭐ Official, well-maintained⭐⭐⭐ Community dashboards
HTML Reports⭐⭐⭐ Optional (k6-reporter)⭐⭐⭐⭐⭐ Built-in, excellent
Real-time Monitoring⭐⭐⭐⭐⭐ Seamless⭐⭐⭐⭐ Works with setup
Fits JMX PatternN/A (not JVM)⭐⭐⭐⭐⭐ Perfect (like your other Java apps)
Unified Monitoring⭐⭐⭐⭐⭐ Load tests + SUT in same Grafana⭐⭐⭐⭐⭐ Load tests + SUT in same Grafana
Total Setup (Day 1)2-4 hours (test) + 30-60 min (metrics)3-5 hours (test) + 1-3 hours (metrics)
Resource Footprint120MB image, 512Mi RAM500MB image, 2Gi RAM (+ JMX sidecar if used)
Language Accessibility⭐⭐⭐⭐⭐ JavaScript⭐⭐⭐ Scala/Java
Time to MVP with Observability1.5-2 days4-6 days

Verdict with Existing Prometheus/Grafana: ⭐⭐⭐⭐⭐ k6 still wins

k6 Advantages:

  • Simpler Prometheus/InfluxDB integration (minutes vs hours)
  • Official Grafana dashboards work immediately
  • Lower resource footprint (no JMX sidecar needed)
  • More accessible language
  • Faster time to production-quality observability

Gatling Advantages:

  • HTML reports for sharing with non-Grafana users
  • JMX pattern matches your existing Java app monitoring (familiar)
  • Scala DSL if team is JVM-proficient

Key Insight: While Gatling can integrate with your Prometheus setup via JMX Exporter (same pattern as your other Java apps), the additional 1-3 hours of setup + community dashboards don’t offset k6’s speed and simplicity advantages.

Other Concerns: Supporting Dimensions

Other Dimensionk6 + Operatork6 + K8s Jobk6 + GitLab RunnerGatling + OperatorGatling + K8s JobGatling + GitLab Runner
Setup Complexity⭐⭐ Operator + CRD installation⭐⭐⭐⭐ Job manifest + kubectl⭐⭐⭐⭐⭐ Just Docker image⭐⭐ Custom operator or Helm⭐⭐⭐⭐ Job manifest + kubectl⭐⭐⭐⭐⭐ Just Docker image
Max Load CapacityVery High (100k+ RPS)High (50k+ RPS with multiple Jobs)Medium (10k RPS per runner)Very High (100k+ RPS)High (50k+ RPS)Medium (10k RPS per runner)
Resource Isolation⭐⭐⭐⭐⭐ Namespaces, quotas, limits⭐⭐⭐⭐⭐ Namespaces, quotas, limits⭐⭐⭐ Runner-level isolation⭐⭐⭐⭐⭐ Namespaces, quotas, limits⭐⭐⭐⭐⭐ Namespaces, quotas, limits⭐⭐⭐ Runner-level isolation
Network Policies⭐⭐⭐⭐⭐ Full NetworkPolicy support⭐⭐⭐⭐⭐ Full NetworkPolicy support⭐⭐ Limited to runner config⭐⭐⭐⭐⭐ Full NetworkPolicy support⭐⭐⭐⭐⭐ Full NetworkPolicy support⭐⭐ Limited to runner config
Network Bottleneck⭐⭐⭐⭐⭐ Separate cluster avoids GitLab bottleneck⭐⭐⭐⭐⭐ Separate cluster avoids GitLab bottleneck⭐⭐ Limited by GitLab network⭐⭐⭐⭐⭐ Separate cluster⭐⭐⭐⭐⭐ Separate cluster avoids bottleneck⭐⭐ Limited by GitLab network
Operational Overhead⭐⭐ Operator maintenance, upgrades⭐⭐⭐⭐ Minimal (standard Jobs)⭐⭐⭐⭐⭐ Minimal⭐⭐ Custom operator maintenance⭐⭐⭐⭐ Minimal (standard Jobs)⭐⭐⭐⭐⭐ Minimal
Observability⭐⭐⭐⭐⭐ K8s metrics, logs, events⭐⭐⭐⭐⭐ K8s logs, easy metric export⭐⭐⭐⭐ GitLab logs, artifacts⭐⭐⭐⭐ Custom dashboards⭐⭐⭐⭐⭐ K8s logs, metric export⭐⭐⭐⭐ GitLab logs, artifacts
Test Lifecycle⭐⭐⭐⭐⭐ Declarative, auto-cleanup⭐⭐⭐⭐ TTL for cleanup, simple⭐⭐⭐ Script-based management⭐⭐⭐ Custom scripts⭐⭐⭐⭐ TTL for cleanup⭐⭐⭐ Script-based
Multi-tenancy⭐⭐⭐⭐⭐ Namespace isolation, RBAC⭐⭐⭐⭐⭐ Namespace isolation, RBAC⭐⭐ Shared runner pool⭐⭐⭐⭐⭐ Namespace isolation⭐⭐⭐⭐⭐ Namespace isolation, RBAC⭐⭐ Shared runner pool
Community Support⭐⭐⭐⭐⭐ Active Grafana Labs⭐⭐⭐⭐⭐ Well-documented pattern⭐⭐⭐⭐⭐ Well documented⭐⭐⭐ Limited operator support⭐⭐⭐⭐⭐ Well-documented⭐⭐⭐⭐⭐ Well documented
ArgoCD Integration⭐⭐⭐⭐⭐ Native GitOps⭐⭐⭐⭐ CronJob or manual triggerN/A (ephemeral)⭐⭐⭐⭐ Custom ArgoCD app⭐⭐⭐⭐ CronJob or manualN/A (ephemeral)
Debugging⭐⭐⭐ K8s pod logs, exec⭐⭐⭐⭐ kubectl logs, local Docker test⭐⭐⭐⭐⭐ Local Docker run⭐⭐⭐ K8s pod logs, exec⭐⭐⭐⭐ kubectl logs, local test⭐⭐⭐⭐⭐ Local Docker run
Image Size~120MB~120MB~120MB~500MB (JVM)~500MB (JVM)~500MB (JVM)
Language/EcosystemJavaScript/TypeScriptJavaScript/TypeScriptJavaScript/TypeScriptScala/JavaScala/JavaScala/Java
Best ForProduction-grade, multi-team, high-scale, GitOpsOur use case: Offload from GitLab, avoid network bottleneck, good balanceQuick start, low-scale, simple testsJVM shops, high-scaleJVM shops, offload from GitLabJVM shops, quick start

Detailed Comparison

Architecture: GitLab CI triggers K8s Job on separate test cluster → Job executes k6 → Collect results

Strengths:

  • Network Isolation: Runs on separate cluster from GitLab, avoiding network bottlenecks to SUT
  • Resource Offloading: GitLab server doesn’t bear the load generation workload
  • Standard K8s Pattern: Jobs are well-understood, mature, and widely used
  • Fast Setup: Standard K8s manifest + kubectl, no operator installation required (2-4 hours to first test)
  • Clean Reporting: k6’s excellent JSON/summary output easily collected as artifacts
  • Horizontal Scaling: Use completions: N with coordination for distributed load
  • Resource Management: Full K8s quotas, limits, and NetworkPolicy support
  • Debugging: Can test Jobs locally with same Docker image
  • Low Overhead: No operator to maintain, Jobs auto-cleanup with TTL

Weaknesses:

  • Manual Coordination: For distributed tests, need custom coordination logic (vs operator’s built-in parallelism)
  • Less Declarative: Requires scripting for test lifecycle (vs operator CRDs)
  • No GitOps: Jobs are ephemeral, not continuously reconciled by ArgoCD

Example Usage:

# GitLab CI triggers this Job
apiVersion: batch/v1
kind: Job
metadata:
  name: load-test-${CI_PIPELINE_ID}
  namespace: load-testing
spec:
  ttlSecondsAfterFinished: 3600  # Auto-cleanup after 1 hour
  completions: 5  # Run 5 parallel Jobs for distributed load
  parallelism: 5
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: k6
        image: grafana/k6:latest
        command:
          - k6
          - run
          - --out=json=/results/output.json
          - --vus=100
          - --duration=5m
          - /scripts/test.js
        volumeMounts:
        - name: test-script
          mountPath: /scripts
        - name: results
          mountPath: /results
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1
            memory: 1Gi
      volumes:
      - name: test-script
        configMap:
          name: test-script-${CI_PIPELINE_ID}
      - name: results
        emptyDir: {}

GitLab CI Integration:

execute-load-test:
  stage: test
  image: bitnami/kubectl:latest
  script:
    # Create ConfigMap with test script
    - kubectl create configmap test-script-${CI_PIPELINE_ID} 
        --from-file=test.js -n load-testing
    
    # Create and run Job
    - envsubst < k8s/job-template.yaml | kubectl apply -f -
    
    # Wait for completion
    - kubectl wait --for=condition=complete --timeout=30m 
        job/load-test-${CI_PIPELINE_ID} -n load-testing
    
    # Collect results from Job pods
    - kubectl logs job/load-test-${CI_PIPELINE_ID} -n load-testing > results.log
    
    # Cleanup
    - kubectl delete configmap test-script-${CI_PIPELINE_ID} -n load-testing
  artifacts:
    paths:
      - results.log
    reports:
      performance: performance.json

Distributed Load Pattern:

# For distributed load, use coordination via env vars
# Each Job instance gets an index: 0, 1, 2, 3, 4
# Split VUs across instances
 
export INSTANCE_INDEX=$JOB_COMPLETION_INDEX
export TOTAL_INSTANCES=5
export TOTAL_VUS=500
export VUS_PER_INSTANCE=$((TOTAL_VUS / TOTAL_INSTANCES))
 
k6 run \
  --vus=${VUS_PER_INSTANCE} \
  --duration=5m \
  --out=json=/results/output-${INSTANCE_INDEX}.json \
  test.js

When to Choose:

  • Our use case: Need to offload from GitLab, avoid network bottlenecks
  • Want K8s benefits (isolation, quotas, NetworkPolicies) without operator complexity
  • Need faster MVP (days vs weeks)
  • Team comfortable with K8s but wants simpler lifecycle than operator
  • Don’t need GitOps continuous reconciliation

k6 with Operator

Strengths:

  • Official Support: Grafana Labs maintains the operator, ensuring compatibility and updates
  • Declarative: Define tests as Kubernetes CRDs (TestRun resources)
  • Horizontal Scaling: Set parallelism: 10 to distribute load across 10 pods automatically
  • Resource Management: Leverage K8s resource quotas, limits, and autoscaling
  • Network Control: Fine-grained NetworkPolicies to restrict test traffic
  • GitOps Ready: Deploy via ArgoCD alongside application infrastructure
  • Cloud Native: Integrates with service meshes, observability stacks

Weaknesses:

  • Setup Time: Requires operator installation, namespace setup, RBAC configuration
  • Learning Curve: Team needs to understand CRDs, K8s resource management
  • Debugging Complexity: Failures require K8s troubleshooting skills
  • Overhead: Operator consumes cluster resources even when idle

Example Usage:

apiVersion: k6.io/v1alpha1
kind: TestRun
metadata:
  name: api-load-test
spec:
  parallelism: 5  # 5 distributed pods
  script:
    configMap:
      name: test-script
  runner:
    resources:
      limits:
        cpu: 1
        memory: 1Gi

When to Choose:

  • Running tests regularly (daily/weekly regression tests)
  • Need to generate >50k RPS
  • Multiple teams using shared infrastructure
  • Strong K8s skills in team
  • Security/isolation requirements (NetworkPolicies)

k6 with Docker (GitLab Runner)

Strengths:

  • Simplicity: Just run docker run grafana/k6:latest run test.js
  • Fast Setup: Working in under an hour
  • Easy Debugging: Run tests locally with same Docker image
  • Low Overhead: No persistent cluster resources
  • Familiar: Standard GitLab CI patterns

Weaknesses:

  • Scale Limits: Single runner caps at ~10k RPS (CPU bound)
  • No Distribution: Can’t easily split load across multiple executors
  • Resource Contention: Shares resources with other CI jobs
  • Limited Isolation: Relies on runner network configuration
  • Manual Orchestration: Need custom scripts for distributed tests

Example Usage:

# .gitlab-ci.yml
load-test:
  image: grafana/k6:latest
  script:
    - k6 run --vus 100 --duration 5m test.js
  artifacts:
    reports:
      performance: summary.json

When to Choose:

  • Getting started quickly (proof of concept)
  • Infrequent ad-hoc testing
  • Low-to-medium load requirements (<10k RPS)
  • Small team with limited K8s expertise
  • Want to validate approach before operator investment

Gatling with K8s Job

Architecture: GitLab CI triggers K8s Job → Job executes Gatling simulation → Collect HTML reports

Strengths:

  • Network Isolation: Same benefits as k6 - separate cluster from GitLab
  • Rich Reports: Gatling’s HTML reports are comprehensive and visual
  • Standard Pattern: K8s Jobs are well-understood
  • JVM Performance: Excellent for very high load scenarios
  • Full Feature Set: All Gatling features available (feeders, checks, DSL)

Weaknesses:

  • Larger Images: ~500MB JVM-based images (vs k6’s 120MB)
  • Slower Startup: JVM warmup time adds latency
  • Resource Intensive: Requires more memory per pod (typically 2Gi vs k6’s 512Mi)
  • Coordination Complexity: Distributed Gatling requires Gatling Enterprise or custom scripts
  • Language Barrier: Scala/Java less accessible than JavaScript

Example Usage:

apiVersion: batch/v1
kind: Job
metadata:
  name: gatling-test-${CI_PIPELINE_ID}
spec:
  ttlSecondsAfterFinished: 3600
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: gatling
        image: denvazh/gatling:latest
        command:
          - /opt/gatling/bin/gatling.sh
          - -sf
          - /simulations
          - -s
          - com.example.ApiSimulation
          - -rf
          - /results
        volumeMounts:
        - name: simulations
          mountPath: /simulations
        - name: results
          mountPath: /results
        resources:
          requests:
            cpu: 1
            memory: 2Gi
          limits:
            cpu: 2
            memory: 4Gi
      volumes:
      - name: simulations
        configMap:
          name: gatling-simulation-${CI_PIPELINE_ID}
      - name: results
        emptyDir: {}

When to Choose:

  • JVM/Scala-based team
  • Need Gatling-specific features (recorder, advanced DSL)
  • Want to offload from GitLab with JVM tooling
  • Very high load requirements (>100k RPS)
  • Willing to accept larger resource footprint

Gatling with Operator

Strengths:

  • High Performance: JVM-based, excellent for very high loads
  • Scala DSL: Powerful test scripting for complex scenarios
  • Detailed Reports: Rich HTML reports with drill-down metrics
  • Enterprise Features: Commercial support available

Weaknesses:

  • Less Mature Operators: No official operator; community solutions vary in quality
  • Setup Complexity: May require custom Helm charts or operator development
  • Larger Footprint: JVM + dependencies = ~500MB images
  • JVM Overhead: Longer startup times, higher memory usage
  • Smaller Community: Less K8s-native ecosystem than k6

Example Custom CRD:

apiVersion: loadtest.io/v1
kind: GatlingTest
metadata:
  name: api-test
spec:
  simulation: com.example.ApiSimulation
  replicas: 5
  resources:
    requests:
      memory: 2Gi
      cpu: 1

When to Choose:

  • Team has strong JVM/Scala skills
  • Need Gatling’s advanced features (feeders, checks, protocols)
  • Willing to maintain custom operator
  • Very high scale requirements (>100k RPS)

Gatling with Docker (GitLab Runner)

Strengths:

  • Standard Approach: Well-documented Docker execution
  • Quick Start: Run without operator complexity
  • Flexible: Easy to customize with scripts
  • Powerful: Full Gatling feature set available

Weaknesses:

  • Large Images: 500MB+ (vs k6’s 120MB)
  • Resource Intensive: JVM requires more memory
  • Slower Startup: JVM warmup time
  • Scala/Java Required: Higher barrier to entry for non-JVM teams
  • Manual Scaling: Hard to distribute load

Example Usage:

load-test:
  image: denvazh/gatling:latest
  script:
    - gatling.sh -s com.example.ApiSimulation
  artifacts:
    paths:
      - target/gatling/

When to Choose:

  • JVM-based organization
  • Need Gatling-specific features
  • Ad-hoc testing without operator investment
  • Small-to-medium scale (<20k RPS)

┌─────────────────────────────────────────────────┐
│ Start: Need load testing framework             │
└──────────────┬──────────────────────────────────┘
               │
               ▼
        ┌──────────────────────────┐
        │ Need to offload from     │ ──Yes──▶ K8s Job (k6 or Gatling)
        │ GitLab + avoid network   │          ↓
        │ bottlenecks?             │          Best balance: speed + isolation
        └──────┬───────────────────┘
               │
               No
               │
               ▼
        ┌──────────────────────────┐
        │ Quick PoC only?          │ ──Yes──▶ GitLab Runner (k6 or Gatling)
        │ (<1 day setup)           │          ↓
        └──────┬───────────────────┘          Fastest start, limited scale
               │
               No
               │
               ▼
        ┌──────────────────────────┐
        │ Need GitOps reconciliation│ ──Yes──▶ k6 Operator
        │ + max automation?         │          ↓
        └──────┬───────────────────┘          Production-grade, most overhead
               │
               No
               │
               ▼
        ┌──────────────────────────┐
        │ JVM-based organization?  │ ──Yes──▶ Gatling + K8s Job
        └──────┬───────────────────┘
               │
               No
               │
               ▼
        k6 + K8s Job (recommended default)

Our Decision: k6 + K8s Job

Selected Approach: k6 with Kubernetes Jobs

Implementation:

  • GitLab CI orchestrates K8s Jobs on separate test cluster
  • Jobs execute k6 load tests against SUT in different cluster
  • Results collected as GitLab artifacts and exported to InfluxDB
  • Horizontal scaling via Job completions parameter with coordination

Rationale Based on Priority Concerns:

  1. Quality & Ease of Reporting (Priority 1):

    • 🎯 GAME CHANGER: You already have Grafana deployed for SUT monitoring
    • k6 wins decisively with existing Grafana:
      • Built-in InfluxDB export: --out influxdb=... (1-line config)
      • Official Grafana dashboards: Import in 5 minutes
      • Real-time monitoring during test execution
      • Historical trending across test runs
      • Unified observability: Monitor load tests AND SUT in same Grafana instance
    • ⚠️ Gatling HTML reports still superior for ad-hoc sharing, BUT:
      • Requires plugin for Grafana integration (not built-in)
      • Community dashboards (less mature than k6’s official ones)
      • Setup time: ~2-3 hours vs k6’s ~30 minutes
    • Implementation Plan:
      • Phase 1 (Day 1): k6 text/JSON (2-4 hours to first test)
      • Phase 1.5 (Day 2): Connect to existing Grafana (30 minutes)
      • Optional: Add k6-reporter for HTML sharing (2 hours)
    • Result: Best of both worlds - Grafana for analysis, optionally HTML for sharing
  2. Speed to MVP (Priority 2):

    • 2-4 hours to first test (vs 1-2 days for operator)
    • 1-3 days to MVP (vs 1-2 weeks for operator)
    • ✅ Standard K8s pattern, no operator installation required
    • ✅ Team already familiar with K8s Jobs
  3. Maturity (Priority 3):

    • ✅ K8s Jobs are rock-solid, production-proven pattern
    • ✅ k6 is mature, well-supported by Grafana Labs
    • ✅ No reliance on less mature operator code paths
  4. Ease of Use (Priority 4):

    • ✅ Standard K8s Job manifests (familiar to team)
    • ✅ Simple kubectl commands for management
    • ✅ JavaScript test scripts (accessible to team)
    • ⚠️ Slight complexity for distributed coordination (acceptable trade-off)
  5. Ease of Horizontal Scaling (Priority 5):

    • ✅ Job completions: N for parallel execution
    • ✅ Coordination via environment variables (JOB_COMPLETION_INDEX)
    • ⚠️ Not as seamless as operator’s parallelism but sufficient for needs

Additional Benefits:

  • Network Isolation: Separate cluster avoids GitLab→SUT network bottleneck
  • Resource Offloading: GitLab servers don’t bear load generation workload
  • Cost-Effective: No operator overhead, Jobs auto-cleanup with TTL
  • Security: Full NetworkPolicy support for SUT access control

Why Not Operator?:

  • Operator setup takes 1-2 weeks vs 1-3 days for Jobs
  • We don’t need GitOps continuous reconciliation (tests are ephemeral)
  • Jobs provide 80% of benefits with 20% of complexity
  • Can migrate to operator later if needs evolve

Why Not Gatling Despite Better Out-of-Box Reports?:

  • Report quality alone doesn’t offset other factors:
    • Larger images (~500MB vs 120MB) → slower startup, more cluster resources
    • Higher resource requirements (2Gi+ memory vs 512Mi) → higher costs
    • Scala/Java less accessible than JavaScript for our team
    • Slower setup time (3-5 hours vs 2-4 hours)
  • k6’s reporting story is better long-term:
    • Grafana/InfluxDB provides real-time monitoring (not just post-test reports)
    • Historical trending across test runs
    • Integration with existing observability platform
    • Gatling’s HTML reports are static snapshots
  • Mitigation for initial reporting gap:
    • k6 text/JSON output sufficient for MVP validation
    • Optional: k6-reporter for basic HTML reports (2 hours setup)
    • Phase 3: Grafana deployment provides production-grade observability

Why Not GitLab Runner Only?:

  • Network bottleneck between GitLab and SUT
  • Runner resource constraints limit scale (~10k RPS)
  • No K8s NetworkPolicy support for SUT isolation

Architecture Overview

┌─────────────────────┐      ┌──────────────────────┐      ┌──────────────────────┐
│   GitLab CI/CD      │      │  Test Cluster        │      │  SUT Cluster         │
│   (Orchestration)   │      │  (Separate from GL)  │      │  (Sandbox SUT)       │
│                     │      │                      │      │                      │
│  - Test Repo        │ ───▶ │  K8s Jobs (k6)       │ ───▶ │  Federated APIs      │
│  - Generation       │      │  Distributed Pods    │      │  (Target System)     │
│  - kubectl trigger  │      │  TTL Cleanup         │      │                      │
│  - Artifact collect │      │                      │      │                      │
└─────────────────────┘      └──────────────────────┘      └──────────────────────┘
         │                            │                              │
         │                            │                              │
         ▼                            ▼                              ▼
┌─────────────────────┐      ┌──────────────────────┐      ┌──────────────────────┐
│  Test Library       │      │  K8s Resources       │      │  API Catalog         │
│  - Scenarios        │      │  - Namespace         │      │  - Service Registry  │
│  - Templates        │      │  - Resource Quotas   │      │  - SLO Definitions   │
│  - Helpers          │      │  - Network Policies  │      │  - Auth Configs      │
│  - Job manifests    │      │  - ConfigMaps        │      │                      │
└─────────────────────┘      └──────────────────────┘      └──────────────────────┘

Key Benefits:
✓ Offloads load from GitLab servers
✓ Avoids GitLab→SUT network bottleneck
✓ Full K8s isolation and quotas
✓ Fast setup (2-4 hours to first test)

Key Architectural Decisions

1. Test Generation Location: GitLab CI

Decision: Generate tests in GitLab CI pipeline
Alternative: In-cluster Job in sandbox environment

Rationale:

  • Better secrets management (GitLab CI variables)
  • No cluster resource consumption during generation
  • Easier debugging and iteration
  • Native GitLab artifact management
  • Clear separation of concerns

Trade-offs:

  • Requires GitLab runner with kubectl/API access
  • Less suitable for very large test suite generation (acceptable for our scale)

2. Test Execution: K8s Jobs on Separate Cluster

Decision: Use Kubernetes Jobs on dedicated test cluster (separate from GitLab and SUT) Alternatives: GitLab Runner execution, k6 Operator

Rationale:

  • Network Isolation: Avoids GitLab→SUT network bottleneck by running tests in separate cluster
  • Resource Offloading: GitLab servers don’t bear load generation workload
  • Fast Setup: Standard K8s Jobs require 2-4 hours vs 1-2 days for operator
  • Maturity: K8s Jobs are production-proven, no operator dependencies
  • Simplicity: Familiar pattern for team, less operational overhead
  • Scaling: Job completions parameter enables horizontal scaling
  • Security: Full NetworkPolicy support for SUT access control

Trade-offs:

  • Manual coordination for distributed tests (vs operator’s built-in parallelism)
  • No GitOps continuous reconciliation (tests are ephemeral anyway)
  • Mitigation: Coordination via JOB_COMPLETION_INDEX environment variable is straightforward

3. Test Storage: Hybrid Model

Decision:

  • Reusable Components: GitLab repository (version-controlled)
  • Generated Tests: Dynamic generation from API catalog
  • Results:
    • Short-term: GitLab artifacts (30 days)
    • Long-term: InfluxDB for trending
    • Reports: S3/MinIO for historical analysis

Rationale:

  • Version control for test logic and scenarios
  • Dynamic generation reduces maintenance burden
  • Multiple retention strategies optimize cost and utility

4. Self-Service Pattern: GitLab CI Variables

Decision: Use GitLab CI manual triggers with pipeline variables

Variables:

TEST_SUITE: "api-federation"      # Which test suite
TARGET_ENVIRONMENT: "sandbox-sut-1" # Target SUT
VIRTUAL_USERS: "100"               # Concurrent users
TEST_DURATION: "5m"                # Test duration
RAMP_UP_TIME: "30s"               # Ramp-up period
TEST_PROFILE: "load"              # smoke|load|stress|spike

Rationale:

  • No custom UI required
  • GitLab’s existing RBAC and audit logging
  • Easy to trigger via UI, API, or CLI
  • Pipeline history provides audit trail

Trade-offs:

  • Less user-friendly than dedicated UI
  • Mitigation: Good documentation + optional wrapper API for non-technical users

5. Network Isolation

Decision: Enforce network policies restricting k6 pods to SUT environment only

NetworkPolicy:
  - Allow: DNS resolution
  - Allow: Traffic to sandbox-sut namespace only
  - Deny: All other egress

Rationale:

  • Prevent accidental load testing of non-target systems
  • Security isolation between sandbox environments
  • Clear blast radius containment

Implementation Plan

Phase 1: Foundation (1-3 days)

Goal: Basic working load test with K8s Jobs

Deliverables:

  1. K8s namespace configured on test cluster with resource quotas
  2. Basic GitLab CI pipeline triggering K8s Jobs
  3. Simple parameterized k6 test example
  4. Documentation for running first test

Tasks:

  • Create load-testing namespace on test cluster with resource quotas and NetworkPolicies
  • Configure GitLab runner with kubectl access to test cluster
  • Create K8s Job manifest template for k6
  • Create GitLab CI pipeline that triggers Jobs via kubectl
  • Write example k6 test script
  • Implement Job result collection (logs → GitLab artifacts)
  • (Optional) Set up k6-reporter for basic HTML reports (2 hours)
  • Document execution workflow and reporting options

Acceptance Criteria:

  • Team member can trigger load test via GitLab UI
  • K8s Job executes on test cluster (not GitLab runner)
  • Test targets sandbox SUT successfully
  • Results collected in GitLab artifacts as JSON/text summary
  • (Optionally) Basic HTML report generated
  • Job auto-cleans up via TTL (1 hour after completion)
  • Documentation explains reporting trade-offs and future Grafana setup

Phase 2: Self-Service & Generation (1-2 weeks)

Goal: Flexible, catalog-driven test generation

Deliverables:

  1. GitLab CI variables for test customization
  2. Test generation scripts (template-based)
  3. Integration with API catalog for dynamic test creation
  4. Multiple test profiles (smoke, load, stress, spike)

Tasks:

  • Implement test generation scripts
  • Create test scenario library
  • Integrate API catalog discovery
  • Add test profile configurations
  • Create test templates

Acceptance Criteria:

  • Tests can be generated from API catalog
  • Multiple test profiles selectable
  • No code changes required for common scenarios

Phase 3: Enhanced Observability & Alerting (3-5 days)

Goal: Leverage existing Grafana for production-grade observability

Deliverables:

  1. InfluxDB integration (already exists for SUT monitoring)
  2. k6 → existing Grafana integration (30 minutes)
  3. Enhanced Grafana dashboards with custom views
  4. Alerting and notification system
  5. Baseline comparison and regression detection

Tasks:

  • Deploy InfluxDB (already exists)
  • Configure k6 → existing InfluxDB in Job manifest (--out influxdb=...)
  • Import official k6 Grafana dashboard (5 minutes)
  • Customize dashboard for your API federation use case
  • Create unified dashboard showing load test + SUT metrics side-by-side
  • Set up GitLab performance reports for merge request widgets
  • Configure Grafana alerts for test failures or SLO breaches
  • Implement notification webhooks (Slack/email via Grafana alerting)
  • Create baseline metrics storage for regression detection

Acceptance Criteria:

  • Real-time metrics visible in existing Grafana during test (not just post-test like Gatling)
  • Historical trend data available in existing InfluxDB across multiple test runs
  • Grafana dashboards show P50/P75/P95/P99 latencies, throughput, error rates
  • Unified view: Load test metrics AND SUT metrics in same dashboard
  • GitLab shows performance regression indicators in merge requests
  • Grafana alerts team of test failures or performance degradations
  • Reporting quality now exceeds Gatling (dynamic vs static, real-time vs post-test, unified observability)

Phase 4: Advanced Features (2-3 weeks)

Goal: Production-ready testing framework

Deliverables:

  1. Multi-scenario testing (mixed workloads)
  2. Baseline comparison and regression detection
  3. Scheduled regression test suite
  4. SLO-based pass/fail criteria
  5. Advanced reporting and analytics

Tasks:

  • Implement multi-scenario orchestration
  • Build baseline metrics storage
  • Create regression detection logic
  • Set up scheduled test pipelines
  • Implement SLO validation
  • Build comprehensive report generator

Acceptance Criteria:

  • Can run mixed workload tests (multiple APIs concurrently)
  • Automatic detection of performance regressions
  • Scheduled tests run nightly against main branch
  • Tests pass/fail based on SLO thresholds

Technical Specifications

Repository Structure

load-testing-framework/
├── .gitlab-ci.yml                 # Main CI/CD pipeline
├── README.md                      # User documentation
├── scripts/
│   ├── generate-test.sh           # Test generation from templates
│   ├── generate-from-catalog.js   # API catalog integration
│   ├── wait-for-completion.sh     # Test monitoring
│   ├── generate-report.sh         # Results processing
│   └── validate-config.sh         # Configuration validation
├── templates/
│   ├── test-template.js.tpl       # k6 test template
│   ├── k6-job.yaml                # K8s Job manifest template
│   └── scenarios.yaml.tpl         # Scenario configurations
├── tests/
│   ├── scenarios/                 # Pre-built test scenarios
│   │   ├── smoke-test.js          # Quick sanity check
│   │   ├── load-test.js           # Sustained load
│   │   ├── stress-test.js         # Breaking point
│   │   └── spike-test.js          # Sudden traffic spike
│   ├── helpers/
│   │   ├── auth.js                # Authentication helpers
│   │   ├── checks.js              # Common assertions
│   │   └── utils.js               # Utilities
│   └── api-catalog.json           # API definitions (generated)
├── config/
│   ├── environments.yaml          # Environment configurations
│   ├── test-profiles.yaml         # Load profiles (VUs, duration, etc.)
│   └── slo-thresholds.yaml        # Performance SLO definitions
├── k8s/
│   ├── namespace.yaml              # load-testing namespace
│   ├── resource-quota.yaml         # Resource limits
│   ├── network-policy.yaml         # Network isolation
│   └── job-template.yaml           # K8s Job template (with envsubst vars)
└── monitoring/
    ├── grafana-dashboards/        # Grafana dashboard JSON
    └── alerting-rules.yaml        # Prometheus alerting rules

K8s Job Configuration

Namespace Configuration:

apiVersion: v1
kind: Namespace
metadata:
  name: load-testing
  labels:
    environment: test-cluster
    purpose: performance-testing
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: load-testing-quota
  namespace: load-testing
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
    pods: "50"  # Allow multiple concurrent test Jobs
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: k6-job-egress-restriction
  namespace: load-testing
spec:
  podSelector:
    matchLabels:
      job-type: load-test  # Applied to all k6 Job pods
  policyTypes:
    - Egress
  egress:
    # Allow DNS
    - to:
      - namespaceSelector:
          matchLabels:
            name: kube-system
      ports:
      - protocol: UDP
        port: 53
    # Allow InfluxDB for metrics export
    - to:
      - namespaceSelector:
          matchLabels:
            name: monitoring
      ports:
      - protocol: TCP
        port: 8086
    # Allow ONLY to SUT cluster (via external IP or cross-cluster service)
    - to:
      - podSelector: {}  # Empty selector = all pods in any namespace
    ports:
    - protocol: TCP
      port: 443
    - protocol: TCP
      port: 80
    # Note: Adjust based on actual SUT cluster connectivity pattern
    # (LoadBalancer IP, cross-cluster mesh, etc.)

K8s Job Template:

# templates/k6-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: load-test-{{ CI_PIPELINE_ID }}
  namespace: load-testing
  labels:
    app: k6
    job-type: load-test
    pipeline-id: "{{ CI_PIPELINE_ID }}"
    test-suite: "{{ TEST_SUITE }}"
spec:
  ttlSecondsAfterFinished: 3600  # Cleanup after 1 hour
  completions: {{ PARALLELISM | default(1) }}  # Number of parallel Jobs
  parallelism: {{ PARALLELISM | default(1) }}
  backoffLimit: 0  # Don't retry failed tests
  template:
    metadata:
      labels:
        app: k6
        job-type: load-test
        pipeline-id: "{{ CI_PIPELINE_ID }}"
    spec:
      restartPolicy: Never
      containers:
      - name: k6
        image: grafana/k6:0.48.0  # Pin version for reproducibility
        command:
          - sh
          - -c
          - |
            # Calculate this instance's share of VUs
            TOTAL_VUS={{ VIRTUAL_USERS }}
            INSTANCE_INDEX=${JOB_COMPLETION_INDEX:-0}
            TOTAL_INSTANCES={{ PARALLELISM | default(1) }}
            VUS_PER_INSTANCE=$((TOTAL_VUS / TOTAL_INSTANCES))
            
            # Run k6 with this instance's VUs
            k6 run \
              --vus=${VUS_PER_INSTANCE} \
              --duration={{ TEST_DURATION }} \
              --out json=/results/summary.json \
              --out influxdb=http://influxdb.monitoring:8086/k6 \
              --tag testrun={{ CI_PIPELINE_ID }} \
              --tag instance=${INSTANCE_INDEX} \
              /scripts/test.js
            
            # Output summary for GitLab artifact collection
            echo "=== Test Instance ${INSTANCE_INDEX} Complete ===" > /results/instance-${INSTANCE_INDEX}.log
            k6 summary /results/summary.json >> /results/instance-${INSTANCE_INDEX}.log
        env:
        - name: TARGET_BASE_URL
          value: "{{ TARGET_BASE_URL }}"
        - name: TEST_PROFILE
          value: "{{ TEST_PROFILE }}"
        volumeMounts:
        - name: test-script
          mountPath: /scripts
          readOnly: true
        - name: results
          mountPath: /results
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1
            memory: 1Gi
      volumes:
      - name: test-script
        configMap:
          name: test-script-{{ CI_PIPELINE_ID }}
      - name: results
        emptyDir: {}

GitLab CI Pipeline

Core Pipeline (.gitlab-ci.yml):

stages:
  - validate
  - generate
  - execute
  - report
 
variables:
  K6_NAMESPACE: load-testing
  K8S_CLUSTER: test-cluster  # Separate cluster from GitLab
  TARGET_SUT_BASE_URL: "https://api.sandbox-sut-1.example.com"
 
# Allow manual triggers with parameters
workflow:
  rules:
    - if: $CI_PIPELINE_SOURCE == "web"
    - if: $CI_PIPELINE_SOURCE == "schedule"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
 
# Pipeline variables for self-service
variables:
  TEST_SUITE:
    value: "api-federation"
    description: "Test suite to run"
  TARGET_ENVIRONMENT:
    value: "sandbox-sut-1"
    description: "Target SUT environment"
  VIRTUAL_USERS:
    value: "100"
    description: "Total virtual users across all instances"
  PARALLELISM:
    value: "1"
    description: "Number of parallel Job instances (for distributed load)"
  TEST_DURATION:
    value: "5m"
    description: "Test duration (30s, 5m, 1h)"
  RAMP_UP_TIME:
    value: "30s"
    description: "Ramp-up duration"
  TEST_PROFILE:
    value: "load"
    description: "Test profile: smoke|load|stress|spike"
 
# Validate configuration
validate:
  stage: validate
  image: grafana/k6:latest
  script:
    - echo "Validating test configuration..."
    - ./scripts/validate-config.sh
    - k6 inspect tests/scenarios/${TEST_PROFILE}-test.js
  rules:
    - if: $CI_PIPELINE_SOURCE == "web" || $CI_PIPELINE_SOURCE == "schedule"
 
# Generate test manifests
generate:
  stage: generate
  image: alpine:latest
  before_script:
    - apk add --no-cache curl jq gettext
  script:
    - echo "Generating tests for ${TEST_SUITE}..."
    - |
      # Fetch API catalog
      curl -s https://api-catalog.example.com/apis \
        -H "Authorization: Bearer ${API_CATALOG_TOKEN}" \
        > tests/api-catalog.json
    
    - |
      # Export environment variables for template substitution
      export TEST_NAME="load-test-${CI_PIPELINE_ID}"
      export TARGET_BASE_URL="https://api.${TARGET_ENVIRONMENT}.example.com"
      export TIMESTAMP=$(date +%s)
    
    - |
      # Generate k6 test script
      envsubst < templates/test-template.js.tpl > generated/test.js
    
    - |
      # Generate K8s TestRun manifest
      envsubst < templates/k6-testrun.yaml.tpl > generated/k6-testrun.yaml
    
    - echo "Generated test configuration:"
    - cat generated/k6-testrun.yaml
  artifacts:
    paths:
      - generated/
      - tests/api-catalog.json
    expire_in: 7 days
 
# Execute load test via K8s Job
execute:
  stage: execute
  image: bitnami/kubectl:latest
  before_script:
    # Configure kubectl to access test cluster (separate from GitLab)
    - kubectl config use-context ${K8S_CLUSTER}
  script:
    - echo "Creating k6 test resources on test cluster..."
    - echo "Job will generate load from ${K8S_CLUSTER} targeting ${TARGET_ENVIRONMENT}"
    
    - |
      # Create ConfigMap with test script and API catalog
      kubectl create configmap test-script-${CI_PIPELINE_ID} \
        --from-file=test.js=generated/test.js \
        --from-file=api-catalog.json=tests/api-catalog.json \
        -n ${K6_NAMESPACE} \
        --dry-run=client -o yaml | kubectl apply -f -
    
    - |
      # Generate Job manifest from template with environment substitution
      export TARGET_BASE_URL="https://api.${TARGET_ENVIRONMENT}.example.com"
      envsubst < templates/k6-job.yaml | kubectl apply -f - -n ${K6_NAMESPACE}
    
    - echo "K8s Job 'load-test-${CI_PIPELINE_ID}' created with ${PARALLELISM} parallel instances"
    - echo "Each instance will run ${VIRTUAL_USERS}/${PARALLELISM} virtual users"
    
    - |
      # Wait for Job completion (all pods must succeed)
      echo "Waiting for test completion (timeout: 30m)..."
      kubectl wait --for=condition=complete \
        --timeout=30m \
        job/load-test-${CI_PIPELINE_ID} \
        -n ${K6_NAMESPACE}
    
    - echo "Collecting test results from all Job instances..."
    - mkdir -p results
    
    - |
      # Collect logs from all Job pods
      kubectl logs \
        -l job-name=load-test-${CI_PIPELINE_ID} \
        -n ${K6_NAMESPACE} \
        --all-containers=true \
        --prefix=true \
        > results/k6-full-output.log
    
    - |
      # Extract summary from each pod
      for pod in $(kubectl get pods -l job-name=load-test-${CI_PIPELINE_ID} -n ${K6_NAMESPACE} -o name); do
        echo "=== Results from $pod ===" >> results/k6-summary.log
        kubectl logs $pod -n ${K6_NAMESPACE} | grep -A 50 "execution:" >> results/k6-summary.log || true
      done
    
    - echo "Test execution complete. Results collected."
    
  after_script:
    # Note: Job will auto-cleanup via ttlSecondsAfterFinished (1 hour)
    # ConfigMap cleanup manual for immediate cleanup
    - kubectl delete configmap test-script-${CI_PIPELINE_ID} -n ${K6_NAMESPACE} || true
    
  artifacts:
    when: always
    paths:
      - results/
    expire_in: 30 days
  environment:
    name: test-cluster
    url: https://api.${TARGET_ENVIRONMENT}.example.com
  timeout: 35m  # Slightly longer than Job wait timeout
 
# Generate and publish reports
report:
  stage: report
  image: python:3.11-slim
  before_script:
    - pip install -q k6-report-generator
  script:
    - echo "Generating performance reports..."
    - ./scripts/generate-report.sh results/k6-output.log
    
    - |
      # Parse summary for GitLab performance widget
      python -c "
      import json
      import sys
      
      # Parse k6 summary and convert to GitLab format
      with open('results/summary.json', 'r') as f:
          data = json.load(f)
      
      gitlab_perf = {
          'metrics': [
              {'name': 'http_req_duration_p95', 'value': data['metrics']['http_req_duration']['p(95)']},
              {'name': 'http_req_duration_p99', 'value': data['metrics']['http_req_duration']['p(99)']},
              {'name': 'http_req_failed_rate', 'value': data['metrics']['http_req_failed']['rate']},
              {'name': 'http_reqs_total', 'value': data['metrics']['http_reqs']['count']},
              {'name': 'vus_max', 'value': data['metrics']['vus_max']['value']},
          ]
      }
      
      with open('performance.json', 'w') as f:
          json.dump(gitlab_perf, f, indent=2)
      "
    
    - echo "Performance summary:"
    - cat performance.json
  artifacts:
    when: always
    reports:
      performance: performance.json
    paths:
      - results/report.html
      - results/summary.json
      - performance.json
    expire_in: 30 days
  dependencies:
    - execute

Test Script Template

Template (templates/test-template.js.tpl):

import http from 'k6/http';
import { check, group, sleep } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';
import { SharedArray } from 'k6/data';
 
// Custom metrics
const errorRate = new Rate('errors');
const apiDuration = new Trend('api_duration');
const apiCalls = new Counter('api_calls');
 
// Load API catalog
const apis = new SharedArray('apis', function() {
  return JSON.parse(open('./api-catalog.json'));
});
 
// Configuration from environment
const BASE_URL = __ENV.TARGET_BASE_URL || 'https://api.sandbox-sut-1.example.com';
const VUS = parseInt(__ENV.VIRTUAL_USERS || '10');
const DURATION = __ENV.TEST_DURATION || '1m';
const RAMP_UP = __ENV.RAMP_UP_TIME || '30s';
const PROFILE = __ENV.TEST_PROFILE || 'load';
 
// Load profile configurations
const profiles = {
  smoke: {
    stages: [
      { duration: '1m', target: 5 },
    ],
    thresholds: {
      'http_req_duration': ['p(95)<1000'],
      'http_req_failed': ['rate<0.05'],
    },
  },
  load: {
    stages: [
      { duration: RAMP_UP, target: VUS * 0.5 },
      { duration: DURATION, target: VUS },
      { duration: '30s', target: 0 },
    ],
    thresholds: {
      'http_req_duration': ['p(95)<500', 'p(99)<1000'],
      'http_req_failed': ['rate<0.01'],
    },
  },
  stress: {
    stages: [
      { duration: '2m', target: VUS },
      { duration: '5m', target: VUS * 2 },
      { duration: '2m', target: VUS * 3 },
      { duration: '5m', target: VUS },
      { duration: '2m', target: 0 },
    ],
    thresholds: {
      'http_req_duration': ['p(95)<1000', 'p(99)<2000'],
      'http_req_failed': ['rate<0.05'],
    },
  },
  spike: {
    stages: [
      { duration: '1m', target: VUS },
      { duration: '10s', target: VUS * 5 },  // Spike
      { duration: '1m', target: VUS },
      { duration: '10s', target: VUS * 5 },  // Second spike
      { duration: '1m', target: 0 },
    ],
    thresholds: {
      'http_req_duration': ['p(95)<1500', 'p(99)<3000'],
      'http_req_failed': ['rate<0.10'],
    },
  },
};
 
// Apply selected profile
export const options = {
  ...profiles[PROFILE],
  tags: {
    test_suite: '${TEST_SUITE}',
    environment: '${TARGET_ENVIRONMENT}',
    pipeline_id: '${CI_PIPELINE_ID}',
  },
  noConnectionReuse: false,
  userAgent: 'k6-load-test/${CI_PIPELINE_ID}',
};
 
// Setup function (runs once per VU)
export function setup() {
  console.log(`Starting ${PROFILE} test with ${VUS} VUs for ${DURATION}`);
  console.log(`Target: ${BASE_URL}`);
  console.log(`APIs under test: ${apis.length}`);
  
  return {
    apis: apis,
    baseUrl: BASE_URL,
  };
}
 
// Main test function
export default function(data) {
  const api = data.apis[Math.floor(Math.random() * data.apis.length)];
  
  group(`API: ${api.name}`, () => {
    const url = `${data.baseUrl}${api.path}`;
    const params = {
      headers: {
        'Content-Type': 'application/json',
        'X-Test-Pipeline': '${CI_PIPELINE_ID}',
        ...(api.headers || {}),
      },
      tags: {
        api_name: api.name,
        api_path: api.path,
      },
      timeout: api.timeout_ms || '30s',
    };
    
    const response = http.get(url, params);
    
    // Record metrics
    apiCalls.add(1);
    apiDuration.add(response.timings.duration, { api: api.name });
    
    // Validate response
    const checkResults = check(response, {
      'status is 200': (r) => r.status === 200,
      'response time OK': (r) => r.timings.duration < (api.slo_ms || 500),
      'has valid body': (r) => r.body && r.body.length > 0,
      'no errors in response': (r) => !r.json('error'),
    });
    
    errorRate.add(!checkResults);
    
    // Log failures
    if (!checkResults) {
      console.error(`API ${api.name} failed: status=${response.status}, duration=${response.timings.duration}ms`);
    }
  });
  
  // Think time
  sleep(Math.random() * 2 + 1);
}
 
// Teardown function
export function teardown(data) {
  console.log('Test completed');
}

Test Profile Configurations

File: config/test-profiles.yaml

profiles:
  smoke:
    description: "Quick sanity check with minimal load"
    virtualUsers: 5
    duration: 1m
    rampUp: 10s
    thresholds:
      p95: 1000ms
      p99: 2000ms
      errorRate: 5%
    
  load:
    description: "Sustained load test at expected traffic levels"
    virtualUsers: 100
    duration: 5m
    rampUp: 30s
    thresholds:
      p95: 500ms
      p99: 1000ms
      errorRate: 1%
    
  stress:
    description: "Push beyond normal load to find breaking point"
    virtualUsers: 200
    duration: 10m
    rampUp: 2m
    stages:
      - duration: 2m
        target: 100
      - duration: 5m
        target: 200
      - duration: 2m
        target: 300
      - duration: 1m
        target: 0
    thresholds:
      p95: 1000ms
      p99: 2000ms
      errorRate: 5%
    
  spike:
    description: "Sudden traffic spikes to test auto-scaling"
    virtualUsers: 150
    duration: 5m
    stages:
      - duration: 1m
        target: 50
      - duration: 10s
        target: 500  # Spike
      - duration: 1m
        target: 50
      - duration: 10s
        target: 500  # Second spike
      - duration: 1m
        target: 0
    thresholds:
      p95: 1500ms
      p99: 3000ms
      errorRate: 10%
    
  soak:
    description: "Extended duration test for stability and memory leaks"
    virtualUsers: 50
    duration: 2h
    rampUp: 5m
    thresholds:
      p95: 500ms
      p99: 1000ms
      errorRate: 1%

Monitoring Integration

InfluxDB Export Configuration:

apiVersion: k6.io/v1alpha1
kind: TestRun
spec:
  script:
    configMap:
      name: test-script
  arguments: |
    --out influxdb=http://influxdb.monitoring:8086/k6
    --tag testrun=${CI_PIPELINE_ID}
    --tag suite=${TEST_SUITE}
    --tag environment=${TARGET_ENVIRONMENT}
    --tag branch=${CI_COMMIT_BRANCH}
  runner:
    env:
      - name: K6_INFLUXDB_INSECURE
        value: "false"
      - name: K6_INFLUXDB_USERNAME
        valueFrom:
          secretKeyRef:
            name: influxdb-credentials
            key: username
      - name: K6_INFLUXDB_PASSWORD
        valueFrom:
          secretKeyRef:
            name: influxdb-credentials
            key: password

Grafana Dashboard JSON (excerpt):

{
  "dashboard": {
    "title": "k6 Load Test Dashboard",
    "panels": [
      {
        "title": "HTTP Request Duration (p95/p99)",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT percentile(\"value\", 95) FROM \"http_req_duration\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          },
          {
            "query": "SELECT percentile(\"value\", 99) FROM \"http_req_duration\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          }
        ]
      },
      {
        "title": "Requests Per Second",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT derivative(mean(\"value\"), 1s) FROM \"http_reqs\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          }
        ]
      },
      {
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT mean(\"value\") FROM \"http_req_failed\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          }
        ]
      },
      {
        "title": "Virtual Users",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT max(\"value\") FROM \"vus\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          }
        ]
      }
    ],
    "templating": {
      "list": [
        {
          "name": "testrun",
          "type": "query",
          "query": "SHOW TAG VALUES WITH KEY = \"testrun\"",
          "current": {
            "text": "auto",
            "value": "$__auto_interval_testrun"
          }
        }
      ]
    }
  }
}

Alternative Approaches Considered

Note: See the “Operator vs Non-Operator Deployment Comparison” section above for a comprehensive decision matrix comparing all execution approaches.

Alternative 1: Simple Docker-Based Execution

Approach: Run k6 directly in GitLab runner containers without K8s operator

Pros:

  • Simpler initial setup (no operator required)
  • Faster to implement (1-2 hours vs 1-2 days)
  • Lower operational overhead (no operator maintenance)
  • Easy local testing and debugging

Cons:

  • Limited scaling (~10k RPS per runner, CPU bound)
  • Less resource isolation (shared runner resources)
  • No distributed load generation (manual orchestration required)
  • Harder to implement network policies (runner-level only)

Decision: Use this as Phase 0 quick-start, then migrate to operator for scale

Rationale: Provides immediate value for validation while full operator infrastructure is being established. See decision matrix above for detailed comparison.

Alternative 2: Locust (Python-based)

Approach: Use Locust for Python-native load testing

Pros:

  • Python-friendly (good for teams with Python expertise)
  • Web UI for monitoring
  • Distributed mode available

Cons:

  • Less Kubernetes-native
  • Heavier resource footprint
  • Less modern metrics/observability
  • Smaller community compared to k6

Decision: Rejected in favor of k6’s better K8s integration

Alternative 3: Managed Service (k6 Cloud, Grafana Cloud)

Approach: Use commercial k6 Cloud service

Pros:

  • Zero infrastructure management
  • Excellent reporting and analytics
  • Global load generation locations

Cons:

  • Cost per test run
  • External dependency
  • Data egress concerns (API catalog, secrets)
  • Less control over execution environment

Decision: Rejected for initial implementation; revisit for global load testing needs

Alternative 4: On-Demand REST API Wrapper

Approach: Build REST API service that wraps k6 execution

Pros:

  • More user-friendly than GitLab UI
  • Custom UI possibilities
  • Better programmatic integration

Cons:

  • Additional service to maintain
  • Reinvents GitLab’s workflow orchestration
  • Requires authentication/authorization implementation

Decision: Defer to Phase 5 if self-service adoption is insufficient

Success Metrics

Adoption Metrics

  • Target: 80% of teams use load testing before production deployments
  • Measure: GitLab pipeline executions, unique user count

Performance Metrics

  • Test Execution Time: <10 minutes for standard load tests
  • Test Setup Time: <5 minutes from trigger to execution start
  • Resource Utilization: <50% of sandbox-test cluster capacity

Quality Metrics

  • Test Reliability: >95% successful test runs (not counting legitimate failures)
  • False Positive Rate: <5% of test failures are infrastructure-related

Efficiency Metrics

  • Time to Create New Test: <30 minutes for catalog-based tests
  • Test Maintenance Burden: <2 hours/week team-wide

Risk Assessment

Technical Risks

RiskImpactProbabilityMitigation
K8s Job failuresMediumLowStandard pattern, use backoffLimit: 0, log all failures
Test cluster resource exhaustionHighMediumStrict resource quotas, Job TTL cleanup, monitoring
Network bottleneck (test cluster → SUT)MediumLowUse separate cluster, monitor bandwidth, tune parallelism
Network policy misconfigurationHighLowThorough testing, clear documentation, dry-run validation
Test generation failuresMediumMediumValidation stage, dry-run mode, schema validation
Metric collection failuresMediumLowMultiple collection methods (logs + InfluxDB), retry logic
Job coordination errors (distributed tests)MediumLowTest coordination logic thoroughly, use JOB_COMPLETION_INDEX

Operational Risks

RiskImpactProbabilityMitigation
Accidental production testingCriticalLowNetwork policies, namespace restrictions, clear naming
Test maintenance burdenMediumHighCatalog-driven generation, reusable components
Low adoptionMediumMediumGood documentation, training, easy onboarding
Cost overrun (compute resources)MediumLowResource quotas, time limits, monitoring

Security Risks

RiskImpactProbabilityMitigation
Credential exposure in testsHighMediumGitLab secrets, vault integration, no hardcoded secrets
Unauthorized access to SUTHighLowGitLab RBAC, K8s RBAC, audit logging
DDoS-like impact on SUTMediumMediumRate limiting, circuit breakers, clear communication

Open Questions

  1. InfluxDB: Do we have an existing InfluxDB instance, or do we need to deploy one?

    • Action: Check with platform team
  2. API Catalog Integration: What format is the API catalog in? REST API, config file, service mesh?

    • Action: Review API catalog documentation
  3. Authentication: How should tests authenticate to federated APIs? OAuth2, API keys, mTLS?

    • Action: Align with security team on test account strategy
  4. Scheduled Tests: Should we run nightly regression tests? Which APIs?

    • Action: Define with product team
  5. SLO Definitions: Do we have formal SLOs for federated APIs?

    • Action: Work with API producers to define/document
  6. Cross-Sandbox Communication: Are there existing network policies between sandbox environments?

    • Action: Review with network team
  7. Cost Allocation: Should we track and charge back load testing costs per team?

    • Action: Discuss with finance/platform teams

References

Documentation

Internal Resources

  • .ai/steering/argocd-development-workflow.md - ArgoCD patterns
  • .ai/steering/docker-image-workflow.md - Container build patterns
  • .ai/steering/testing-standards.md - Testing guidelines
  • API Catalog documentation (TBD)
  • Sandbox environment inventory (TBD)

Example Projects

Next Steps

  1. Immediate (This Week):

    • Review and approve this decision record
    • Answer open questions
    • Assign owner for implementation
  2. Short Term (Next Sprint):

    • Create implementation project (BMAD or Codev format)
    • Set up development environment
    • Begin Phase 1 implementation
  3. Medium Term (Next Month):

    • Complete Phase 1 foundation
    • Conduct pilot with 2-3 teams
    • Gather feedback and iterate
  4. Long Term (Next Quarter):

    • Complete all phases
    • Full team rollout
    • Integration with CI/CD pipeline standards

Approval

Proposed By: Platform Engineering Team
Date: 2026-02-04

Reviewers:

  • Platform Architecture Lead
  • API Federation Team Lead
  • Security Team
  • SRE Team

Status: Awaiting Review


Last Updated: 2026-02-04
Version: 1.0