Decision Record: Ad-Hoc Load Testing Framework

Date: 2026-02-04
Status: Proposed
Category: Infrastructure
Decision Makers: Platform Engineering Team

Context

We run an integration platform that federates APIs, allowing producers to surface their APIs for self-service consumption. The platform operates large sandbox instances, and we need a flexible, easy-to-configure system for running ad-hoc load tests.

Current State

Infrastructure: ArgoCD and GitLab available
Test Generation: Can run in GitLab CI or sandbox environment
Target System Under Test (SUT): Different sandbox environment from test generation
Scale: Need to test multiple federated APIs with varying load profiles
Use Cases:
- Ad-hoc performance validation
- Pre-production load testing
- API capacity planning
- Performance regression detection

Requirements

Flexibility: Easy to configure different test scenarios and targets
Self-Service: Teams should be able to trigger tests with minimal friction
Isolation: Test generation and SUT should be separate environments
Observability: Clear metrics and reporting
Reproducibility: Tests should be version-controlled and repeatable
Resource Efficiency: Don’t consume unnecessary sandbox resources

Decision

We will implement a k6-based load testing framework with the following architecture:

Tool Selection: k6 over Gatling

Chosen: k6
Alternatives Considered: Gatling, Locust, JMeter

Rationale:

Kubernetes-native: k6 operator enables distributed testing in K8s clusters
Lightweight: Smaller container footprint suitable for sandbox constraints
Developer-friendly: JavaScript/TypeScript tests are easier to write and maintain
GitLab Integration: Excellent CI/CD support with native performance reporting
Flexible Execution: CLI, K8s operator, or cloud-based execution modes
Modern Metrics: Built-in Prometheus/InfluxDB support

Trade-offs:

Gatling has better GUI for test recording (not critical for API testing)
k6 JavaScript runtime has learning curve for Java/JVM teams (acceptable given broader JS adoption)

Operator vs Non-Operator Deployment Comparison

A critical decision in implementing load testing is whether to use a Kubernetes operator, K8s Jobs, or simpler container-based execution. This affects architecture, scalability, operational complexity, and time-to-value.

Our Approach: We plan to use K8s Jobs triggered from GitLab CI, running on a separate cluster from the SUT. This offloads work from GitLab servers and avoids potential network bottlenecks between GitLab and the SUT.

Priority Concerns: Decision Matrix

These dimensions are critical to our decision-making process.

Priority Dimension	k6 + Operator	k6 + K8s Job	k6 + GitLab Runner	Gatling + Operator	Gatling + K8s Job	Gatling + GitLab Runner
Quality of Reporting (Out-of-Box)	⭐⭐⭐ JSON/text summary (needs Grafana for visual)	⭐⭐⭐ JSON/text summary (needs Grafana for visual)	⭐⭐⭐ JSON/text summary (needs Grafana for visual)	⭐⭐⭐⭐⭐ Rich HTML reports built-in, detailed drill-downs, charts	⭐⭐⭐⭐⭐ Rich HTML reports built-in, detailed drill-downs	⭐⭐⭐⭐⭐ Rich HTML reports built-in
Quality with Tooling	⭐⭐⭐⭐⭐ Excellent with Grafana/InfluxDB	⭐⭐⭐⭐⭐ Excellent with Grafana/InfluxDB	⭐⭐⭐⭐ Good with Grafana/InfluxDB	⭐⭐⭐⭐⭐ Built-in + optional Grafana	⭐⭐⭐⭐⭐ Built-in + optional Grafana	⭐⭐⭐⭐⭐ Built-in + optional Grafana
Ease of Reporting	⭐⭐⭐⭐ Automated via CRD, requires Grafana setup	⭐⭐⭐⭐ Simple artifact collection, requires Grafana/report gen	⭐⭐⭐⭐ GitLab artifacts, requires Grafana/report gen	⭐⭐⭐⭐ Custom collection, HTML ready	⭐⭐⭐⭐⭐ HTML reports work immediately	⭐⭐⭐⭐⭐ HTML reports work immediately
Time to First Test	1-2 days	2-4 hours	1-2 hours	2-3 days	3-5 hours	1-2 hours
Time to MVP	1-2 weeks	1-3 days	1-2 days	2-3 weeks	3-5 days	1-2 days
Maturity	⭐⭐⭐⭐⭐ Official Grafana Labs operator, production-ready	⭐⭐⭐⭐⭐ Standard K8s Job pattern, rock solid	⭐⭐⭐⭐⭐ Standard Docker execution	⭐⭐⭐ Community operators, less mature	⭐⭐⭐⭐⭐ Standard K8s Job pattern	⭐⭐⭐⭐⭐ Standard Docker execution
Ease of Use	⭐⭐⭐ Requires CRD knowledge, K8s expertise	⭐⭐⭐⭐ Standard K8s Job, familiar to teams	⭐⭐⭐⭐⭐ Simple Docker run command	⭐⭐ Custom CRDs or complex Helm charts	⭐⭐⭐⭐ Standard K8s Job	⭐⭐⭐⭐⭐ Simple Docker run
Ease of Horizontal Scaling	⭐⭐⭐⭐⭐ Built-in `parallelism` parameter	⭐⭐⭐⭐ Job `completions: N` + manual coordination	⭐⭐ Manual multi-runner orchestration	⭐⭐⭐⭐ Operator-managed or manual	⭐⭐⭐⭐ Job `completions: N` + coordination scripts	⭐⭐ Manual orchestration

Rating Key: ⭐⭐⭐⭐⭐ Excellent | ⭐⭐⭐⭐ Good | ⭐⭐⭐ Acceptable | ⭐⭐ Limited | ⭐ Poor

Reporting Deep Dive: k6 vs Gatling

This is a critical differentiator that deserves detailed explanation.

Gatling Reporting (Out-of-the-Box Winner)

What You Get Immediately:

Rich HTML Reports: Beautiful, interactive reports generated automatically after each test
Visual Charts: Response time distribution, requests/second, response time percentiles over time
Drill-Down Capability: Click into specific requests, see detailed stats per endpoint
Statistical Analysis: Min/max/mean/percentiles, standard deviation
Error Analysis: Detailed breakdown of failures with counts and percentages
Self-Contained: Single HTML file (or folder) you can share, no server required

Example Gatling Report Sections:

1. Global Information: Total requests, OK/KO counts, min/max/mean/percentiles
2. Statistics Table: Per-request breakdown with all metrics
3. Active Users Over Time: Graph showing VU ramp-up/down
4. Response Time Distribution: Histogram of latencies
5. Response Time Percentiles: P50/P75/P95/P99 over time
6. Requests Per Second: Throughput over time
7. Responses Per Second: Success/failure rates

Artifact Collection:

# Gatling generates to target/gatling/<timestamp>/
# Contains: index.html + js/ + style/ folders
kubectl cp <pod>:/results/gatling ./gatling-report
# Open index.html in browser - fully functional report

Verdict: ⭐⭐⭐⭐⭐ Production-ready reports with zero additional tooling

Gatling → Grafana Integration Options:

Since you already have Prometheus and JMX monitoring infrastructure, Gatling has several options:

Option 1: Prometheus + JMX Exporter ⭐⭐⭐⭐ (Best for your setup)

How: Gatling exposes JMX metrics → JMX Exporter → Prometheus → Grafana
Setup:
1. Run Gatling with JMX enabled: -Dgatling.jmx.enabled=true
2. Deploy JMX Exporter as sidecar in K8s Job pod
3. Configure Prometheus to scrape JMX Exporter endpoint
Pros:
- ✅ Leverages your existing Prometheus infrastructure
- ✅ Same pattern as other Java apps you monitor
- ✅ Real-time metrics during test execution
Cons:
- ⚠️ JMX Exporter sidecar adds complexity to Job manifest
- ⚠️ Need to configure JMX metric mappings
- ⚠️ Community Grafana dashboards (not official)
Setup Time: 2-3 hours (sidecar config + Prometheus ServiceMonitor + dashboard)

Option 2: Prometheus Pushgateway ⭐⭐⭐⭐

How: Gatling pushes metrics to Pushgateway → Prometheus scrapes → Grafana

Plugin: Use gatling-prometheus plugin

// build.sbt
libraryDependencies += "com.github.lkishalmi.gatling" % "gatling-prometheus" % "3.11.1"

# gatling.conf
data {
  writers = [console, file, prometheus]
}
prometheus {
  pushgateway {
    url = "http://pushgateway.monitoring:9091"
  }
}

Pros:
- ✅ Works well for batch jobs (like K8s Jobs)
- ✅ Simpler than JMX Exporter (no sidecar)
- ✅ Designed for short-lived processes
Cons:
- ⚠️ Requires plugin installation (not built-in)
- ⚠️ Pushgateway required (may already have it)
- ⚠️ Metrics persist in Pushgateway after test (need cleanup)
Setup Time: 1-2 hours (plugin + pushgateway config)

Option 3: InfluxDB Export ⭐⭐⭐

How: Gatling → InfluxDB → Grafana

Plugin: gatling-influxdb

libraryDependencies += "com.github.gatling" % "gatling-influxdb" % "1.1.4"

Pros:
- ✅ Direct time-series storage
- ✅ Good for historical trending
Cons:
- ⚠️ Requires InfluxDB (if you don’t have it)
- ⚠️ Separate from your Prometheus infrastructure
- ⚠️ Plugin required
Setup Time: 2-4 hours (deploy InfluxDB if needed + plugin config)

Option 4: Graphite Export ⭐⭐

How: Built-in Gatling Graphite support → Grafana Graphite datasource

Configuration: Built into Gatling (no plugin)

data {
  writers = [console, file, graphite]
}
graphite {
  host = "graphite.monitoring"
  port = 2003
}

Pros:
- ✅ No plugin required
- ✅ Built-in support
Cons:
- ⚠️ Requires Graphite (probably don’t have it)
- ⚠️ Less common than Prometheus
Setup Time: 2-4 hours (deploy Graphite + configure)

Detailed Setup: Option 1 (JMX Exporter - Recommended for you)

K8s Job manifest with JMX Exporter sidecar:

apiVersion: batch/v1
kind: Job
metadata:
  name: gatling-test-${CI_PIPELINE_ID}
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9404"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      # Main Gatling container
      - name: gatling
        image: denvazh/gatling:latest
        command:
          - /opt/gatling/bin/gatling.sh
          - -sf=/simulations
          - -s=com.example.ApiSimulation
        env:
        - name: JAVA_OPTS
          value: "-Dgatling.jmx.enabled=true -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
        resources:
          requests:
            memory: 2Gi
            cpu: 1
          limits:
            memory: 4Gi
            cpu: 2
      
      # JMX Exporter sidecar
      - name: jmx-exporter
        image: bitnami/jmx-exporter:latest
        ports:
        - containerPort: 9404
          name: metrics
        volumeMounts:
        - name: jmx-config
          mountPath: /etc/jmx-exporter
        command:
        - java
        - -jar
        - /opt/bitnami/jmx-exporter/jmx_prometheus_httpserver.jar
        - "9404"
        - /etc/jmx-exporter/config.yaml
        resources:
          requests:
            memory: 128Mi
            cpu: 100m
      
      volumes:
      - name: jmx-config
        configMap:
          name: gatling-jmx-config

JMX Exporter config:

# ConfigMap: gatling-jmx-config
apiVersion: v1
kind: ConfigMap
metadata:
  name: gatling-jmx-config
data:
  config.yaml: |
    hostPort: localhost:1099
    rules:
    - pattern: "io.gatling.core<type=AllRequests><>(.+)"
      name: gatling_all_requests_$1
    - pattern: "io.gatling.core<type=Simulation><>(.+)"
      name: gatling_simulation_$1
    - pattern: "io.gatling.core<type=Request, name=(.+)><>(.+)"
      name: gatling_request_$2
      labels:
        request: "$1"

Prometheus ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: gatling-tests
  namespace: load-testing
spec:
  selector:
    matchLabels:
      job-type: load-test
  podMetricsEndpoints:
  - port: metrics
    interval: 10s

Setup Time Breakdown:

JMX Exporter sidecar config: 30 minutes
JMX metric mapping config: 1 hour
Prometheus PodMonitor: 15 minutes
Grafana dashboard: 1 hour
Total: ~2-3 hours

Detailed Setup: Option 2 (Pushgateway - Simpler)

Gatling with Prometheus plugin:

apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      containers:
      - name: gatling
        image: custom-gatling-with-prometheus-plugin:latest  # Custom image with plugin
        command:
          - /opt/gatling/bin/gatling.sh
          - -sf=/simulations
          - -s=com.example.ApiSimulation
        env:
        - name: PUSHGATEWAY_URL
          value: "http://pushgateway.monitoring:9091"
        resources:
          requests:
            memory: 2Gi
            cpu: 1

Gatling config (baked into custom image):

# gatling.conf
data {
  writers = [console, file, prometheus]
}
 
prometheus {
  pushgateway {
    url = ${?PUSHGATEWAY_URL}
    jobName = "gatling-load-test"
  }
}

Setup Time Breakdown:

Build custom image with plugin: 1 hour
Pushgateway deployment (if needed): 30 minutes
Prometheus scrape config: 15 minutes
Grafana dashboard: 1 hour
Total: ~2-3 hours (first time), 30 min (subsequent)

Comparison for Your Infrastructure:

Method	Fits Existing Setup	Real-time	Setup Time	Complexity
JMX Exporter	⭐⭐⭐⭐⭐ Uses existing Prometheus + JMX pattern	✅ Yes	2-3 hours	⭐⭐⭐ Moderate
Pushgateway	⭐⭐⭐⭐⭐ Uses existing Prometheus	✅ Yes	1-2 hours	⭐⭐⭐⭐ Simpler
InfluxDB	⭐⭐ Requires separate stack	✅ Yes	2-4 hours	⭐⭐⭐ Moderate
Graphite	⭐ Requires Graphite	✅ Yes	2-4 hours	⭐⭐⭐ Moderate

Recommendation for Your Setup:

Prometheus Pushgateway (if you have it) - simplest
JMX Exporter (if you don’t) - uses familiar pattern

Result: ⭐⭐⭐⭐ Gatling can integrate with your Prometheus/Grafana stack, but requires 1-3 hours additional setup vs k6’s native support

Comparison for Your Use Case (Existing Prometheus/Grafana):

Aspect	k6	Gatling
Prometheus Export	⭐⭐⭐⭐⭐ Built-in experimental, or easy plugin	⭐⭐⭐⭐ Via Pushgateway plugin or JMX Exporter
InfluxDB Export	⭐⭐⭐⭐⭐ Built-in `--out influxdb=...`	⭐⭐⭐ Requires plugin
Setup Effort (Prometheus)	1 flag or small config	Plugin + custom image OR JMX sidecar (2-3 hours)
Setup Effort (InfluxDB)	1 line in command (if you have InfluxDB)	Plugin + config file
Grafana Dashboards	⭐⭐⭐⭐⭐ Official, well-maintained	⭐⭐⭐ Community, requires customization
Dashboard Availability	Multiple official options	Limited community options
Data Schema	Standardized, well-documented	Less standardized, varies by export method
Real-time Monitoring	⭐⭐⭐⭐⭐ Seamless	⭐⭐⭐⭐ Works with setup
JMX Pattern Fit	N/A (not JVM)	⭐⭐⭐⭐⭐ Perfect fit (you already monitor JMX)

Since you already have Prometheus/Grafana:

✅ k6’s advantage remains strong (simpler Prometheus export)
✅ Official k6 Grafana dashboards work out-of-box
✅ Gatling can integrate via JMX Exporter (familiar pattern for your Java apps)
⚠️ Gatling requires 1-3 hours additional setup vs k6’s minutes
⚠️ Gatling dashboards are community-maintained (less polished)

k6 Reporting (Trivial with Existing Grafana)

What You Get Immediately:

Text Summary: Console output with basic stats

execution: local
script: test.js
output: -

scenarios: (100.00%) 1 scenario, 100 max VUs, 5m30s max duration

✓ status is 200
✓ response time OK

checks.........................: 100.00% ✓ 50000 ✗ 0
data_received..................: 150 MB  500 kB/s
data_sent......................: 5.0 MB  17 kB/s
http_req_blocked...............: avg=1ms    min=0s   med=1ms  max=10ms  p(90)=2ms   p(95)=3ms
http_req_duration..............: avg=100ms  min=50ms med=95ms max=500ms p(90)=150ms p(95)=200ms
http_reqs......................: 50000   166.666667/s

JSON Output: Machine-readable metrics for parsing

{
  "metrics": {
    "http_req_duration": {
      "type": "trend",
      "contains": "time",
      "values": {
        "min": 50.123,
        "max": 500.456,
        "avg": 100.789,
        "med": 95.234,
        "p(90)": 150.567,
        "p(95)": 200.890,
        "p(99)": 450.123
      }
    }
  }
}

What You DON’T Get:

❌ No visual charts/graphs
❌ No drill-down HTML interface
❌ No time-series graphs (response time over duration)
❌ No distribution histograms

Options to Get Visual Reports:

Option 1: Grafana + InfluxDB (Best, Production-Grade)

Export metrics: k6 run --out influxdb=http://influxdb:8086/k6 test.js
Real-time dashboards during test execution
Historical trending across test runs
Requires: InfluxDB deployed, Grafana dashboards configured
Setup Time: 2-4 hours for first-time setup
Result: ⭐⭐⭐⭐⭐ Production-grade observability

Option 2: k6 HTML Report Generator (Third-Party)

Tool: k6-reporter (npm package) or k6-html-reporter
Generate HTML from JSON: k6-reporter summary.json
Creates basic HTML page with charts
Requires: Node.js, external package
Result: ⭐⭐⭐ Basic HTML, not as rich as Gatling

Option 3: k6 Cloud (Commercial)

Export to Grafana Cloud k6
Beautiful reports, no infrastructure
Requires: Subscription, data egress to cloud
Result: ⭐⭐⭐⭐⭐ Excellent but costs $$

GitLab Performance Widget:

# k6 can output to GitLab's performance format
artifacts:
  reports:
    performance: performance.json  # GitLab shows trend graph

Shows basic trend line in merge requests
Result: ⭐⭐⭐ Useful for CI/CD gates, not detailed analysis

Verdict:

Out-of-box: ⭐⭐⭐ Text/JSON only, requires tooling for visuals
With Grafana: ⭐⭐⭐⭐⭐ Excellent real-time + historical analysis
Trade-off: Setup overhead vs immediate gratification

Recommendation Based on Your Priorities

Since “Quality and Ease of Reporting” is your Priority #1, consider:

IMPORTANT: You Already Have Grafana 🎯

This significantly changes the evaluation in k6’s favor:

k6 with Existing Grafana (⭐⭐⭐⭐⭐ Recommended):

✅ Trivial setup: Add single flag --out influxdb=http://influxdb:8086/k6
✅ Official dashboards: Import Grafana k6 dashboard in 5 minutes
✅ Real-time monitoring: Watch tests execute live in Grafana
✅ Unified observability: Monitor both load tests AND SUT in same Grafana instance
✅ Setup time: ~30 minutes (vs 2-4 hours if deploying Grafana from scratch)
✅ Result: Best of both worlds - HTML for ad-hoc sharing, Grafana for analysis

Gatling with Existing Grafana (⭐⭐⭐ Possible but more work):

⚠️ Requires gatling-influxdb plugin (not built-in)
⚠️ Community dashboards (less polished than k6’s official ones)
⚠️ Additional build configuration (Maven/sbt dependency)
⚠️ Still get HTML reports, but Grafana integration is secondary
⚠️ Setup time: ~2-3 hours (plugin + dashboard customization)

Our Recommendation (With Existing Grafana):

Phase 1 (Day 1): k6 + text/JSON

Get first test working in 2-4 hours
Text output sufficient to validate approach

Phase 1.5 (Day 2): Connect to Grafana (30 minutes)

Add --out influxdb=... to Job manifest
Import official k6 dashboard to Grafana
Now have real-time monitoring + historical trending

Optional: k6-reporter for HTML reports

Use for sharing results with stakeholders who don’t have Grafana access
2 hours setup time

Result:

✅ Gatling’s advantage (HTML reports) becomes less critical
✅ k6’s Grafana integration is simpler and better supported
✅ You get best of both: Grafana for analysis, optionally HTML for sharing
✅ All observability in one place (load tests + SUT metrics in same Grafana)

Why k6 wins with existing Grafana:

Setup: 1-line config vs plugin installation
Dashboard quality: Official vs community
Unified monitoring: Load test + SUT metrics side-by-side
Lower resources: 512Mi vs 2Gi memory
Faster setup: 2-4 hrs vs 3-5 hrs total
More accessible: JS vs Scala

Gatling only makes sense if:

Team is already JVM/Scala-proficient
Need Gatling-specific features (recorder, complex DSL)
HTML reports are critical and Grafana access is restricted
Willing to invest in plugin setup + custom dashboards

Decision Table: With Existing Prometheus/Grafana 🎯

Factor	k6	Gatling
Prometheus Integration	⭐⭐⭐⭐⭐ Built-in experimental or xk6 plugin	⭐⭐⭐⭐ Via Pushgateway plugin or JMX Exporter
InfluxDB Integration	⭐⭐⭐⭐⭐ Built-in, 1-line flag	⭐⭐⭐ Plugin required
Setup Time (Prometheus)	30 min - 1 hour	1-3 hours (JMX sidecar or custom image)
Setup Time (InfluxDB)	30 minutes	2-3 hours (plugin + config)
Dashboard Quality	⭐⭐⭐⭐⭐ Official, well-maintained	⭐⭐⭐ Community dashboards
HTML Reports	⭐⭐⭐ Optional (k6-reporter)	⭐⭐⭐⭐⭐ Built-in, excellent
Real-time Monitoring	⭐⭐⭐⭐⭐ Seamless	⭐⭐⭐⭐ Works with setup
Fits JMX Pattern	N/A (not JVM)	⭐⭐⭐⭐⭐ Perfect (like your other Java apps)
Unified Monitoring	⭐⭐⭐⭐⭐ Load tests + SUT in same Grafana	⭐⭐⭐⭐⭐ Load tests + SUT in same Grafana
Total Setup (Day 1)	2-4 hours (test) + 30-60 min (metrics)	3-5 hours (test) + 1-3 hours (metrics)
Resource Footprint	120MB image, 512Mi RAM	500MB image, 2Gi RAM (+ JMX sidecar if used)
Language Accessibility	⭐⭐⭐⭐⭐ JavaScript	⭐⭐⭐ Scala/Java
Time to MVP with Observability	1.5-2 days	4-6 days

Verdict with Existing Prometheus/Grafana: ⭐⭐⭐⭐⭐ k6 still wins

k6 Advantages:

Simpler Prometheus/InfluxDB integration (minutes vs hours)
Official Grafana dashboards work immediately
Lower resource footprint (no JMX sidecar needed)
More accessible language
Faster time to production-quality observability

Gatling Advantages:

HTML reports for sharing with non-Grafana users
JMX pattern matches your existing Java app monitoring (familiar)
Scala DSL if team is JVM-proficient

Key Insight: While Gatling can integrate with your Prometheus setup via JMX Exporter (same pattern as your other Java apps), the additional 1-3 hours of setup + community dashboards don’t offset k6’s speed and simplicity advantages.

Other Concerns: Supporting Dimensions

Other Dimension	k6 + Operator	k6 + K8s Job	k6 + GitLab Runner	Gatling + Operator	Gatling + K8s Job	Gatling + GitLab Runner
Setup Complexity	⭐⭐ Operator + CRD installation	⭐⭐⭐⭐ Job manifest + kubectl	⭐⭐⭐⭐⭐ Just Docker image	⭐⭐ Custom operator or Helm	⭐⭐⭐⭐ Job manifest + kubectl	⭐⭐⭐⭐⭐ Just Docker image
Max Load Capacity	Very High (100k+ RPS)	High (50k+ RPS with multiple Jobs)	Medium (10k RPS per runner)	Very High (100k+ RPS)	High (50k+ RPS)	Medium (10k RPS per runner)
Resource Isolation	⭐⭐⭐⭐⭐ Namespaces, quotas, limits	⭐⭐⭐⭐⭐ Namespaces, quotas, limits	⭐⭐⭐ Runner-level isolation	⭐⭐⭐⭐⭐ Namespaces, quotas, limits	⭐⭐⭐⭐⭐ Namespaces, quotas, limits	⭐⭐⭐ Runner-level isolation
Network Policies	⭐⭐⭐⭐⭐ Full NetworkPolicy support	⭐⭐⭐⭐⭐ Full NetworkPolicy support	⭐⭐ Limited to runner config	⭐⭐⭐⭐⭐ Full NetworkPolicy support	⭐⭐⭐⭐⭐ Full NetworkPolicy support	⭐⭐ Limited to runner config
Network Bottleneck	⭐⭐⭐⭐⭐ Separate cluster avoids GitLab bottleneck	⭐⭐⭐⭐⭐ Separate cluster avoids GitLab bottleneck	⭐⭐ Limited by GitLab network	⭐⭐⭐⭐⭐ Separate cluster	⭐⭐⭐⭐⭐ Separate cluster avoids bottleneck	⭐⭐ Limited by GitLab network
Operational Overhead	⭐⭐ Operator maintenance, upgrades	⭐⭐⭐⭐ Minimal (standard Jobs)	⭐⭐⭐⭐⭐ Minimal	⭐⭐ Custom operator maintenance	⭐⭐⭐⭐ Minimal (standard Jobs)	⭐⭐⭐⭐⭐ Minimal
Observability	⭐⭐⭐⭐⭐ K8s metrics, logs, events	⭐⭐⭐⭐⭐ K8s logs, easy metric export	⭐⭐⭐⭐ GitLab logs, artifacts	⭐⭐⭐⭐ Custom dashboards	⭐⭐⭐⭐⭐ K8s logs, metric export	⭐⭐⭐⭐ GitLab logs, artifacts
Test Lifecycle	⭐⭐⭐⭐⭐ Declarative, auto-cleanup	⭐⭐⭐⭐ TTL for cleanup, simple	⭐⭐⭐ Script-based management	⭐⭐⭐ Custom scripts	⭐⭐⭐⭐ TTL for cleanup	⭐⭐⭐ Script-based
Multi-tenancy	⭐⭐⭐⭐⭐ Namespace isolation, RBAC	⭐⭐⭐⭐⭐ Namespace isolation, RBAC	⭐⭐ Shared runner pool	⭐⭐⭐⭐⭐ Namespace isolation	⭐⭐⭐⭐⭐ Namespace isolation, RBAC	⭐⭐ Shared runner pool
Community Support	⭐⭐⭐⭐⭐ Active Grafana Labs	⭐⭐⭐⭐⭐ Well-documented pattern	⭐⭐⭐⭐⭐ Well documented	⭐⭐⭐ Limited operator support	⭐⭐⭐⭐⭐ Well-documented	⭐⭐⭐⭐⭐ Well documented
ArgoCD Integration	⭐⭐⭐⭐⭐ Native GitOps	⭐⭐⭐⭐ CronJob or manual trigger	N/A (ephemeral)	⭐⭐⭐⭐ Custom ArgoCD app	⭐⭐⭐⭐ CronJob or manual	N/A (ephemeral)
Debugging	⭐⭐⭐ K8s pod logs, exec	⭐⭐⭐⭐ kubectl logs, local Docker test	⭐⭐⭐⭐⭐ Local Docker run	⭐⭐⭐ K8s pod logs, exec	⭐⭐⭐⭐ kubectl logs, local test	⭐⭐⭐⭐⭐ Local Docker run
Image Size	~120MB	~120MB	~120MB	~500MB (JVM)	~500MB (JVM)	~500MB (JVM)
Language/Ecosystem	JavaScript/TypeScript	JavaScript/TypeScript	JavaScript/TypeScript	Scala/Java	Scala/Java	Scala/Java
Best For	Production-grade, multi-team, high-scale, GitOps	Our use case: Offload from GitLab, avoid network bottleneck, good balance	Quick start, low-scale, simple tests	JVM shops, high-scale	JVM shops, offload from GitLab	JVM shops, quick start

Detailed Comparison

k6 with K8s Job ⭐ RECOMMENDED

Architecture: GitLab CI triggers K8s Job on separate test cluster → Job executes k6 → Collect results

Strengths:

Network Isolation: Runs on separate cluster from GitLab, avoiding network bottlenecks to SUT
Resource Offloading: GitLab server doesn’t bear the load generation workload
Standard K8s Pattern: Jobs are well-understood, mature, and widely used
Fast Setup: Standard K8s manifest + kubectl, no operator installation required (2-4 hours to first test)
Clean Reporting: k6’s excellent JSON/summary output easily collected as artifacts
Horizontal Scaling: Use completions: N with coordination for distributed load
Resource Management: Full K8s quotas, limits, and NetworkPolicy support
Debugging: Can test Jobs locally with same Docker image
Low Overhead: No operator to maintain, Jobs auto-cleanup with TTL

Weaknesses:

Manual Coordination: For distributed tests, need custom coordination logic (vs operator’s built-in parallelism)
Less Declarative: Requires scripting for test lifecycle (vs operator CRDs)
No GitOps: Jobs are ephemeral, not continuously reconciled by ArgoCD

Example Usage:

# GitLab CI triggers this Job
apiVersion: batch/v1
kind: Job
metadata:
  name: load-test-${CI_PIPELINE_ID}
  namespace: load-testing
spec:
  ttlSecondsAfterFinished: 3600  # Auto-cleanup after 1 hour
  completions: 5  # Run 5 parallel Jobs for distributed load
  parallelism: 5
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: k6
        image: grafana/k6:latest
        command:
          - k6
          - run
          - --out=json=/results/output.json
          - --vus=100
          - --duration=5m
          - /scripts/test.js
        volumeMounts:
        - name: test-script
          mountPath: /scripts
        - name: results
          mountPath: /results
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1
            memory: 1Gi
      volumes:
      - name: test-script
        configMap:
          name: test-script-${CI_PIPELINE_ID}
      - name: results
        emptyDir: {}

GitLab CI Integration:

execute-load-test:
  stage: test
  image: bitnami/kubectl:latest
  script:
    # Create ConfigMap with test script
    - kubectl create configmap test-script-${CI_PIPELINE_ID} 
        --from-file=test.js -n load-testing
    
    # Create and run Job
    - envsubst < k8s/job-template.yaml | kubectl apply -f -
    
    # Wait for completion
    - kubectl wait --for=condition=complete --timeout=30m 
        job/load-test-${CI_PIPELINE_ID} -n load-testing
    
    # Collect results from Job pods
    - kubectl logs job/load-test-${CI_PIPELINE_ID} -n load-testing > results.log
    
    # Cleanup
    - kubectl delete configmap test-script-${CI_PIPELINE_ID} -n load-testing
  artifacts:
    paths:
      - results.log
    reports:
      performance: performance.json

Distributed Load Pattern:

# For distributed load, use coordination via env vars
# Each Job instance gets an index: 0, 1, 2, 3, 4
# Split VUs across instances
 
export INSTANCE_INDEX=$JOB_COMPLETION_INDEX
export TOTAL_INSTANCES=5
export TOTAL_VUS=500
export VUS_PER_INSTANCE=$((TOTAL_VUS / TOTAL_INSTANCES))
 
k6 run \
  --vus=${VUS_PER_INSTANCE} \
  --duration=5m \
  --out=json=/results/output-${INSTANCE_INDEX}.json \
  test.js

When to Choose:

Our use case: Need to offload from GitLab, avoid network bottlenecks
Want K8s benefits (isolation, quotas, NetworkPolicies) without operator complexity
Need faster MVP (days vs weeks)
Team comfortable with K8s but wants simpler lifecycle than operator
Don’t need GitOps continuous reconciliation

k6 with Operator

Strengths:

Official Support: Grafana Labs maintains the operator, ensuring compatibility and updates
Declarative: Define tests as Kubernetes CRDs (TestRun resources)
Horizontal Scaling: Set parallelism: 10 to distribute load across 10 pods automatically
Resource Management: Leverage K8s resource quotas, limits, and autoscaling
Network Control: Fine-grained NetworkPolicies to restrict test traffic
GitOps Ready: Deploy via ArgoCD alongside application infrastructure
Cloud Native: Integrates with service meshes, observability stacks

Weaknesses:

Setup Time: Requires operator installation, namespace setup, RBAC configuration
Learning Curve: Team needs to understand CRDs, K8s resource management
Debugging Complexity: Failures require K8s troubleshooting skills
Overhead: Operator consumes cluster resources even when idle

Example Usage:

apiVersion: k6.io/v1alpha1
kind: TestRun
metadata:
  name: api-load-test
spec:
  parallelism: 5  # 5 distributed pods
  script:
    configMap:
      name: test-script
  runner:
    resources:
      limits:
        cpu: 1
        memory: 1Gi

When to Choose:

Running tests regularly (daily/weekly regression tests)
Need to generate >50k RPS
Multiple teams using shared infrastructure
Strong K8s skills in team
Security/isolation requirements (NetworkPolicies)

k6 with Docker (GitLab Runner)

Strengths:

Simplicity: Just run docker run grafana/k6:latest run test.js
Fast Setup: Working in under an hour
Easy Debugging: Run tests locally with same Docker image
Low Overhead: No persistent cluster resources
Familiar: Standard GitLab CI patterns

Weaknesses:

Scale Limits: Single runner caps at ~10k RPS (CPU bound)
No Distribution: Can’t easily split load across multiple executors
Resource Contention: Shares resources with other CI jobs
Limited Isolation: Relies on runner network configuration
Manual Orchestration: Need custom scripts for distributed tests

Example Usage:

# .gitlab-ci.yml
load-test:
  image: grafana/k6:latest
  script:
    - k6 run --vus 100 --duration 5m test.js
  artifacts:
    reports:
      performance: summary.json

When to Choose:

Getting started quickly (proof of concept)
Infrequent ad-hoc testing
Low-to-medium load requirements (<10k RPS)
Small team with limited K8s expertise
Want to validate approach before operator investment

Gatling with K8s Job

Architecture: GitLab CI triggers K8s Job → Job executes Gatling simulation → Collect HTML reports

Strengths:

Network Isolation: Same benefits as k6 - separate cluster from GitLab
Rich Reports: Gatling’s HTML reports are comprehensive and visual
Standard Pattern: K8s Jobs are well-understood
JVM Performance: Excellent for very high load scenarios
Full Feature Set: All Gatling features available (feeders, checks, DSL)

Weaknesses:

Larger Images: ~500MB JVM-based images (vs k6’s 120MB)
Slower Startup: JVM warmup time adds latency
Resource Intensive: Requires more memory per pod (typically 2Gi vs k6’s 512Mi)
Coordination Complexity: Distributed Gatling requires Gatling Enterprise or custom scripts
Language Barrier: Scala/Java less accessible than JavaScript

Example Usage:

apiVersion: batch/v1
kind: Job
metadata:
  name: gatling-test-${CI_PIPELINE_ID}
spec:
  ttlSecondsAfterFinished: 3600
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: gatling
        image: denvazh/gatling:latest
        command:
          - /opt/gatling/bin/gatling.sh
          - -sf
          - /simulations
          - -s
          - com.example.ApiSimulation
          - -rf
          - /results
        volumeMounts:
        - name: simulations
          mountPath: /simulations
        - name: results
          mountPath: /results
        resources:
          requests:
            cpu: 1
            memory: 2Gi
          limits:
            cpu: 2
            memory: 4Gi
      volumes:
      - name: simulations
        configMap:
          name: gatling-simulation-${CI_PIPELINE_ID}
      - name: results
        emptyDir: {}

When to Choose:

JVM/Scala-based team
Need Gatling-specific features (recorder, advanced DSL)
Want to offload from GitLab with JVM tooling
Very high load requirements (>100k RPS)
Willing to accept larger resource footprint

Gatling with Operator

Strengths:

High Performance: JVM-based, excellent for very high loads
Scala DSL: Powerful test scripting for complex scenarios
Detailed Reports: Rich HTML reports with drill-down metrics
Enterprise Features: Commercial support available

Weaknesses:

Less Mature Operators: No official operator; community solutions vary in quality
Setup Complexity: May require custom Helm charts or operator development
Larger Footprint: JVM + dependencies = ~500MB images
JVM Overhead: Longer startup times, higher memory usage
Smaller Community: Less K8s-native ecosystem than k6

Example Custom CRD:

apiVersion: loadtest.io/v1
kind: GatlingTest
metadata:
  name: api-test
spec:
  simulation: com.example.ApiSimulation
  replicas: 5
  resources:
    requests:
      memory: 2Gi
      cpu: 1

When to Choose:

Team has strong JVM/Scala skills
Need Gatling’s advanced features (feeders, checks, protocols)
Willing to maintain custom operator
Very high scale requirements (>100k RPS)

Gatling with Docker (GitLab Runner)

Strengths:

Standard Approach: Well-documented Docker execution
Quick Start: Run without operator complexity
Flexible: Easy to customize with scripts
Powerful: Full Gatling feature set available

Weaknesses:

Large Images: 500MB+ (vs k6’s 120MB)
Resource Intensive: JVM requires more memory
Slower Startup: JVM warmup time
Scala/Java Required: Higher barrier to entry for non-JVM teams
Manual Scaling: Hard to distribute load

Example Usage:

load-test:
  image: denvazh/gatling:latest
  script:
    - gatling.sh -s com.example.ApiSimulation
  artifacts:
    paths:
      - target/gatling/

When to Choose:

JVM-based organization
Need Gatling-specific features
Ad-hoc testing without operator investment
Small-to-medium scale (<20k RPS)

Recommended Decision Path

┌─────────────────────────────────────────────────┐
│ Start: Need load testing framework             │
└──────────────┬──────────────────────────────────┘
               │
               ▼
        ┌──────────────────────────┐
        │ Need to offload from     │ ──Yes──▶ K8s Job (k6 or Gatling)
        │ GitLab + avoid network   │          ↓
        │ bottlenecks?             │          Best balance: speed + isolation
        └──────┬───────────────────┘
               │
               No
               │
               ▼
        ┌──────────────────────────┐
        │ Quick PoC only?          │ ──Yes──▶ GitLab Runner (k6 or Gatling)
        │ (<1 day setup)           │          ↓
        └──────┬───────────────────┘          Fastest start, limited scale
               │
               No
               │
               ▼
        ┌──────────────────────────┐
        │ Need GitOps reconciliation│ ──Yes──▶ k6 Operator
        │ + max automation?         │          ↓
        └──────┬───────────────────┘          Production-grade, most overhead
               │
               No
               │
               ▼
        ┌──────────────────────────┐
        │ JVM-based organization?  │ ──Yes──▶ Gatling + K8s Job
        └──────┬───────────────────┘
               │
               No
               │
               ▼
        k6 + K8s Job (recommended default)

Our Decision: k6 + K8s Job

Selected Approach: k6 with Kubernetes Jobs

Implementation:

GitLab CI orchestrates K8s Jobs on separate test cluster
Jobs execute k6 load tests against SUT in different cluster
Results collected as GitLab artifacts and exported to InfluxDB
Horizontal scaling via Job completions parameter with coordination

Rationale Based on Priority Concerns:

Quality & Ease of Reporting (Priority 1):
- 🎯 GAME CHANGER: You already have Grafana deployed for SUT monitoring
- ✅ k6 wins decisively with existing Grafana:
  - Built-in InfluxDB export: --out influxdb=... (1-line config)
  - Official Grafana dashboards: Import in 5 minutes
  - Real-time monitoring during test execution
  - Historical trending across test runs
  - Unified observability: Monitor load tests AND SUT in same Grafana instance
- ⚠️ Gatling HTML reports still superior for ad-hoc sharing, BUT:
  - Requires plugin for Grafana integration (not built-in)
  - Community dashboards (less mature than k6’s official ones)
  - Setup time: ~2-3 hours vs k6’s ~30 minutes
- ✅ Implementation Plan:
  - Phase 1 (Day 1): k6 text/JSON (2-4 hours to first test)
  - Phase 1.5 (Day 2): Connect to existing Grafana (30 minutes)
  - Optional: Add k6-reporter for HTML sharing (2 hours)
- ✅ Result: Best of both worlds - Grafana for analysis, optionally HTML for sharing
Speed to MVP (Priority 2):
- ✅ 2-4 hours to first test (vs 1-2 days for operator)
- ✅ 1-3 days to MVP (vs 1-2 weeks for operator)
- ✅ Standard K8s pattern, no operator installation required
- ✅ Team already familiar with K8s Jobs
Maturity (Priority 3):
- ✅ K8s Jobs are rock-solid, production-proven pattern
- ✅ k6 is mature, well-supported by Grafana Labs
- ✅ No reliance on less mature operator code paths
Ease of Use (Priority 4):
- ✅ Standard K8s Job manifests (familiar to team)
- ✅ Simple kubectl commands for management
- ✅ JavaScript test scripts (accessible to team)
- ⚠️ Slight complexity for distributed coordination (acceptable trade-off)
Ease of Horizontal Scaling (Priority 5):
- ✅ Job completions: N for parallel execution
- ✅ Coordination via environment variables (JOB_COMPLETION_INDEX)
- ⚠️ Not as seamless as operator’s parallelism but sufficient for needs

Additional Benefits:

Network Isolation: Separate cluster avoids GitLab→SUT network bottleneck
Resource Offloading: GitLab servers don’t bear load generation workload
Cost-Effective: No operator overhead, Jobs auto-cleanup with TTL
Security: Full NetworkPolicy support for SUT access control

Why Not Operator?:

Operator setup takes 1-2 weeks vs 1-3 days for Jobs
We don’t need GitOps continuous reconciliation (tests are ephemeral)
Jobs provide 80% of benefits with 20% of complexity
Can migrate to operator later if needs evolve

Why Not Gatling Despite Better Out-of-Box Reports?:

Report quality alone doesn’t offset other factors:
- Larger images (~500MB vs 120MB) → slower startup, more cluster resources
- Higher resource requirements (2Gi+ memory vs 512Mi) → higher costs
- Scala/Java less accessible than JavaScript for our team
- Slower setup time (3-5 hours vs 2-4 hours)
k6’s reporting story is better long-term:
- Grafana/InfluxDB provides real-time monitoring (not just post-test reports)
- Historical trending across test runs
- Integration with existing observability platform
- Gatling’s HTML reports are static snapshots
Mitigation for initial reporting gap:
- k6 text/JSON output sufficient for MVP validation
- Optional: k6-reporter for basic HTML reports (2 hours setup)
- Phase 3: Grafana deployment provides production-grade observability

Why Not GitLab Runner Only?:

Network bottleneck between GitLab and SUT
Runner resource constraints limit scale (~10k RPS)
No K8s NetworkPolicy support for SUT isolation

Architecture Overview

┌─────────────────────┐      ┌──────────────────────┐      ┌──────────────────────┐
│   GitLab CI/CD      │      │  Test Cluster        │      │  SUT Cluster         │
│   (Orchestration)   │      │  (Separate from GL)  │      │  (Sandbox SUT)       │
│                     │      │                      │      │                      │
│  - Test Repo        │ ───▶ │  K8s Jobs (k6)       │ ───▶ │  Federated APIs      │
│  - Generation       │      │  Distributed Pods    │      │  (Target System)     │
│  - kubectl trigger  │      │  TTL Cleanup         │      │                      │
│  - Artifact collect │      │                      │      │                      │
└─────────────────────┘      └──────────────────────┘      └──────────────────────┘
         │                            │                              │
         │                            │                              │
         ▼                            ▼                              ▼
┌─────────────────────┐      ┌──────────────────────┐      ┌──────────────────────┐
│  Test Library       │      │  K8s Resources       │      │  API Catalog         │
│  - Scenarios        │      │  - Namespace         │      │  - Service Registry  │
│  - Templates        │      │  - Resource Quotas   │      │  - SLO Definitions   │
│  - Helpers          │      │  - Network Policies  │      │  - Auth Configs      │
│  - Job manifests    │      │  - ConfigMaps        │      │                      │
└─────────────────────┘      └──────────────────────┘      └──────────────────────┘

Key Benefits:
✓ Offloads load from GitLab servers
✓ Avoids GitLab→SUT network bottleneck
✓ Full K8s isolation and quotas
✓ Fast setup (2-4 hours to first test)

Key Architectural Decisions

1. Test Generation Location: GitLab CI

Decision: Generate tests in GitLab CI pipeline
Alternative: In-cluster Job in sandbox environment

Rationale:

Better secrets management (GitLab CI variables)
No cluster resource consumption during generation
Easier debugging and iteration
Native GitLab artifact management
Clear separation of concerns

Trade-offs:

Requires GitLab runner with kubectl/API access
Less suitable for very large test suite generation (acceptable for our scale)

2. Test Execution: K8s Jobs on Separate Cluster

Decision: Use Kubernetes Jobs on dedicated test cluster (separate from GitLab and SUT) Alternatives: GitLab Runner execution, k6 Operator

Rationale:

Network Isolation: Avoids GitLab→SUT network bottleneck by running tests in separate cluster
Resource Offloading: GitLab servers don’t bear load generation workload
Fast Setup: Standard K8s Jobs require 2-4 hours vs 1-2 days for operator
Maturity: K8s Jobs are production-proven, no operator dependencies
Simplicity: Familiar pattern for team, less operational overhead
Scaling: Job completions parameter enables horizontal scaling
Security: Full NetworkPolicy support for SUT access control

Trade-offs:

Manual coordination for distributed tests (vs operator’s built-in parallelism)
No GitOps continuous reconciliation (tests are ephemeral anyway)
Mitigation: Coordination via JOB_COMPLETION_INDEX environment variable is straightforward

3. Test Storage: Hybrid Model

Decision:

Reusable Components: GitLab repository (version-controlled)
Generated Tests: Dynamic generation from API catalog
Results:
- Short-term: GitLab artifacts (30 days)
- Long-term: InfluxDB for trending
- Reports: S3/MinIO for historical analysis

Rationale:

Version control for test logic and scenarios
Dynamic generation reduces maintenance burden
Multiple retention strategies optimize cost and utility

4. Self-Service Pattern: GitLab CI Variables

Decision: Use GitLab CI manual triggers with pipeline variables

Variables:

TEST_SUITE: "api-federation"      # Which test suite
TARGET_ENVIRONMENT: "sandbox-sut-1" # Target SUT
VIRTUAL_USERS: "100"               # Concurrent users
TEST_DURATION: "5m"                # Test duration
RAMP_UP_TIME: "30s"               # Ramp-up period
TEST_PROFILE: "load"              # smoke|load|stress|spike

Rationale:

No custom UI required
GitLab’s existing RBAC and audit logging
Easy to trigger via UI, API, or CLI
Pipeline history provides audit trail

Trade-offs:

Less user-friendly than dedicated UI
Mitigation: Good documentation + optional wrapper API for non-technical users

5. Network Isolation

Decision: Enforce network policies restricting k6 pods to SUT environment only

NetworkPolicy:
  - Allow: DNS resolution
  - Allow: Traffic to sandbox-sut namespace only
  - Deny: All other egress

Rationale:

Prevent accidental load testing of non-target systems
Security isolation between sandbox environments
Clear blast radius containment

Implementation Plan

Phase 1: Foundation (1-3 days)

Goal: Basic working load test with K8s Jobs

Deliverables:

K8s namespace configured on test cluster with resource quotas
Basic GitLab CI pipeline triggering K8s Jobs
Simple parameterized k6 test example
Documentation for running first test

Tasks:

Create load-testing namespace on test cluster with resource quotas and NetworkPolicies
Configure GitLab runner with kubectl access to test cluster
Create K8s Job manifest template for k6
Create GitLab CI pipeline that triggers Jobs via kubectl
Write example k6 test script
Implement Job result collection (logs → GitLab artifacts)
(Optional) Set up k6-reporter for basic HTML reports (2 hours)
Document execution workflow and reporting options

Acceptance Criteria:

Team member can trigger load test via GitLab UI
K8s Job executes on test cluster (not GitLab runner)
Test targets sandbox SUT successfully
Results collected in GitLab artifacts as JSON/text summary
(Optionally) Basic HTML report generated
Job auto-cleans up via TTL (1 hour after completion)
Documentation explains reporting trade-offs and future Grafana setup

Phase 2: Self-Service & Generation (1-2 weeks)

Goal: Flexible, catalog-driven test generation

Deliverables:

GitLab CI variables for test customization
Test generation scripts (template-based)
Integration with API catalog for dynamic test creation
Multiple test profiles (smoke, load, stress, spike)

Tasks:

Implement test generation scripts
Create test scenario library
Integrate API catalog discovery
Add test profile configurations
Create test templates

Acceptance Criteria:

Tests can be generated from API catalog
Multiple test profiles selectable
No code changes required for common scenarios

Phase 3: Enhanced Observability & Alerting (3-5 days)

Goal: Leverage existing Grafana for production-grade observability

Deliverables:

~~InfluxDB integration~~ (already exists for SUT monitoring)
k6 → existing Grafana integration (30 minutes)
Enhanced Grafana dashboards with custom views
Alerting and notification system
Baseline comparison and regression detection

Tasks:

~~Deploy InfluxDB~~ (already exists)
Configure k6 → existing InfluxDB in Job manifest (--out influxdb=...)
Import official k6 Grafana dashboard (5 minutes)
Customize dashboard for your API federation use case
Create unified dashboard showing load test + SUT metrics side-by-side
Set up GitLab performance reports for merge request widgets
Configure Grafana alerts for test failures or SLO breaches
Implement notification webhooks (Slack/email via Grafana alerting)
Create baseline metrics storage for regression detection

Acceptance Criteria:

Real-time metrics visible in existing Grafana during test (not just post-test like Gatling)
Historical trend data available in existing InfluxDB across multiple test runs
Grafana dashboards show P50/P75/P95/P99 latencies, throughput, error rates
Unified view: Load test metrics AND SUT metrics in same dashboard
GitLab shows performance regression indicators in merge requests
Grafana alerts team of test failures or performance degradations
Reporting quality now exceeds Gatling (dynamic vs static, real-time vs post-test, unified observability)

Phase 4: Advanced Features (2-3 weeks)

Goal: Production-ready testing framework

Deliverables:

Multi-scenario testing (mixed workloads)
Baseline comparison and regression detection
Scheduled regression test suite
SLO-based pass/fail criteria
Advanced reporting and analytics

Tasks:

Implement multi-scenario orchestration
Build baseline metrics storage
Create regression detection logic
Set up scheduled test pipelines
Implement SLO validation
Build comprehensive report generator

Acceptance Criteria:

Can run mixed workload tests (multiple APIs concurrently)
Automatic detection of performance regressions
Scheduled tests run nightly against main branch
Tests pass/fail based on SLO thresholds

Technical Specifications

Repository Structure

load-testing-framework/
├── .gitlab-ci.yml                 # Main CI/CD pipeline
├── README.md                      # User documentation
├── scripts/
│   ├── generate-test.sh           # Test generation from templates
│   ├── generate-from-catalog.js   # API catalog integration
│   ├── wait-for-completion.sh     # Test monitoring
│   ├── generate-report.sh         # Results processing
│   └── validate-config.sh         # Configuration validation
├── templates/
│   ├── test-template.js.tpl       # k6 test template
│   ├── k6-job.yaml                # K8s Job manifest template
│   └── scenarios.yaml.tpl         # Scenario configurations
├── tests/
│   ├── scenarios/                 # Pre-built test scenarios
│   │   ├── smoke-test.js          # Quick sanity check
│   │   ├── load-test.js           # Sustained load
│   │   ├── stress-test.js         # Breaking point
│   │   └── spike-test.js          # Sudden traffic spike
│   ├── helpers/
│   │   ├── auth.js                # Authentication helpers
│   │   ├── checks.js              # Common assertions
│   │   └── utils.js               # Utilities
│   └── api-catalog.json           # API definitions (generated)
├── config/
│   ├── environments.yaml          # Environment configurations
│   ├── test-profiles.yaml         # Load profiles (VUs, duration, etc.)
│   └── slo-thresholds.yaml        # Performance SLO definitions
├── k8s/
│   ├── namespace.yaml              # load-testing namespace
│   ├── resource-quota.yaml         # Resource limits
│   ├── network-policy.yaml         # Network isolation
│   └── job-template.yaml           # K8s Job template (with envsubst vars)
└── monitoring/
    ├── grafana-dashboards/        # Grafana dashboard JSON
    └── alerting-rules.yaml        # Prometheus alerting rules

K8s Job Configuration

Namespace Configuration:

apiVersion: v1
kind: Namespace
metadata:
  name: load-testing
  labels:
    environment: test-cluster
    purpose: performance-testing
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: load-testing-quota
  namespace: load-testing
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
    pods: "50"  # Allow multiple concurrent test Jobs
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: k6-job-egress-restriction
  namespace: load-testing
spec:
  podSelector:
    matchLabels:
      job-type: load-test  # Applied to all k6 Job pods
  policyTypes:
    - Egress
  egress:
    # Allow DNS
    - to:
      - namespaceSelector:
          matchLabels:
            name: kube-system
      ports:
      - protocol: UDP
        port: 53
    # Allow InfluxDB for metrics export
    - to:
      - namespaceSelector:
          matchLabels:
            name: monitoring
      ports:
      - protocol: TCP
        port: 8086
    # Allow ONLY to SUT cluster (via external IP or cross-cluster service)
    - to:
      - podSelector: {}  # Empty selector = all pods in any namespace
    ports:
    - protocol: TCP
      port: 443
    - protocol: TCP
      port: 80
    # Note: Adjust based on actual SUT cluster connectivity pattern
    # (LoadBalancer IP, cross-cluster mesh, etc.)

K8s Job Template:

# templates/k6-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: load-test-{{ CI_PIPELINE_ID }}
  namespace: load-testing
  labels:
    app: k6
    job-type: load-test
    pipeline-id: "{{ CI_PIPELINE_ID }}"
    test-suite: "{{ TEST_SUITE }}"
spec:
  ttlSecondsAfterFinished: 3600  # Cleanup after 1 hour
  completions: {{ PARALLELISM | default(1) }}  # Number of parallel Jobs
  parallelism: {{ PARALLELISM | default(1) }}
  backoffLimit: 0  # Don't retry failed tests
  template:
    metadata:
      labels:
        app: k6
        job-type: load-test
        pipeline-id: "{{ CI_PIPELINE_ID }}"
    spec:
      restartPolicy: Never
      containers:
      - name: k6
        image: grafana/k6:0.48.0  # Pin version for reproducibility
        command:
          - sh
          - -c
          - |
            # Calculate this instance's share of VUs
            TOTAL_VUS={{ VIRTUAL_USERS }}
            INSTANCE_INDEX=${JOB_COMPLETION_INDEX:-0}
            TOTAL_INSTANCES={{ PARALLELISM | default(1) }}
            VUS_PER_INSTANCE=$((TOTAL_VUS / TOTAL_INSTANCES))
            
            # Run k6 with this instance's VUs
            k6 run \
              --vus=${VUS_PER_INSTANCE} \
              --duration={{ TEST_DURATION }} \
              --out json=/results/summary.json \
              --out influxdb=http://influxdb.monitoring:8086/k6 \
              --tag testrun={{ CI_PIPELINE_ID }} \
              --tag instance=${INSTANCE_INDEX} \
              /scripts/test.js
            
            # Output summary for GitLab artifact collection
            echo "=== Test Instance ${INSTANCE_INDEX} Complete ===" > /results/instance-${INSTANCE_INDEX}.log
            k6 summary /results/summary.json >> /results/instance-${INSTANCE_INDEX}.log
        env:
        - name: TARGET_BASE_URL
          value: "{{ TARGET_BASE_URL }}"
        - name: TEST_PROFILE
          value: "{{ TEST_PROFILE }}"
        volumeMounts:
        - name: test-script
          mountPath: /scripts
          readOnly: true
        - name: results
          mountPath: /results
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1
            memory: 1Gi
      volumes:
      - name: test-script
        configMap:
          name: test-script-{{ CI_PIPELINE_ID }}
      - name: results
        emptyDir: {}

GitLab CI Pipeline

Core Pipeline (.gitlab-ci.yml):

stages:
  - validate
  - generate
  - execute
  - report
 
variables:
  K6_NAMESPACE: load-testing
  K8S_CLUSTER: test-cluster  # Separate cluster from GitLab
  TARGET_SUT_BASE_URL: "https://api.sandbox-sut-1.example.com"
 
# Allow manual triggers with parameters
workflow:
  rules:
    - if: $CI_PIPELINE_SOURCE == "web"
    - if: $CI_PIPELINE_SOURCE == "schedule"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
 
# Pipeline variables for self-service
variables:
  TEST_SUITE:
    value: "api-federation"
    description: "Test suite to run"
  TARGET_ENVIRONMENT:
    value: "sandbox-sut-1"
    description: "Target SUT environment"
  VIRTUAL_USERS:
    value: "100"
    description: "Total virtual users across all instances"
  PARALLELISM:
    value: "1"
    description: "Number of parallel Job instances (for distributed load)"
  TEST_DURATION:
    value: "5m"
    description: "Test duration (30s, 5m, 1h)"
  RAMP_UP_TIME:
    value: "30s"
    description: "Ramp-up duration"
  TEST_PROFILE:
    value: "load"
    description: "Test profile: smoke|load|stress|spike"
 
# Validate configuration
validate:
  stage: validate
  image: grafana/k6:latest
  script:
    - echo "Validating test configuration..."
    - ./scripts/validate-config.sh
    - k6 inspect tests/scenarios/${TEST_PROFILE}-test.js
  rules:
    - if: $CI_PIPELINE_SOURCE == "web" || $CI_PIPELINE_SOURCE == "schedule"
 
# Generate test manifests
generate:
  stage: generate
  image: alpine:latest
  before_script:
    - apk add --no-cache curl jq gettext
  script:
    - echo "Generating tests for ${TEST_SUITE}..."
    - |
      # Fetch API catalog
      curl -s https://api-catalog.example.com/apis \
        -H "Authorization: Bearer ${API_CATALOG_TOKEN}" \
        > tests/api-catalog.json
    
    - |
      # Export environment variables for template substitution
      export TEST_NAME="load-test-${CI_PIPELINE_ID}"
      export TARGET_BASE_URL="https://api.${TARGET_ENVIRONMENT}.example.com"
      export TIMESTAMP=$(date +%s)
    
    - |
      # Generate k6 test script
      envsubst < templates/test-template.js.tpl > generated/test.js
    
    - |
      # Generate K8s TestRun manifest
      envsubst < templates/k6-testrun.yaml.tpl > generated/k6-testrun.yaml
    
    - echo "Generated test configuration:"
    - cat generated/k6-testrun.yaml
  artifacts:
    paths:
      - generated/
      - tests/api-catalog.json
    expire_in: 7 days
 
# Execute load test via K8s Job
execute:
  stage: execute
  image: bitnami/kubectl:latest
  before_script:
    # Configure kubectl to access test cluster (separate from GitLab)
    - kubectl config use-context ${K8S_CLUSTER}
  script:
    - echo "Creating k6 test resources on test cluster..."
    - echo "Job will generate load from ${K8S_CLUSTER} targeting ${TARGET_ENVIRONMENT}"
    
    - |
      # Create ConfigMap with test script and API catalog
      kubectl create configmap test-script-${CI_PIPELINE_ID} \
        --from-file=test.js=generated/test.js \
        --from-file=api-catalog.json=tests/api-catalog.json \
        -n ${K6_NAMESPACE} \
        --dry-run=client -o yaml | kubectl apply -f -
    
    - |
      # Generate Job manifest from template with environment substitution
      export TARGET_BASE_URL="https://api.${TARGET_ENVIRONMENT}.example.com"
      envsubst < templates/k6-job.yaml | kubectl apply -f - -n ${K6_NAMESPACE}
    
    - echo "K8s Job 'load-test-${CI_PIPELINE_ID}' created with ${PARALLELISM} parallel instances"
    - echo "Each instance will run ${VIRTUAL_USERS}/${PARALLELISM} virtual users"
    
    - |
      # Wait for Job completion (all pods must succeed)
      echo "Waiting for test completion (timeout: 30m)..."
      kubectl wait --for=condition=complete \
        --timeout=30m \
        job/load-test-${CI_PIPELINE_ID} \
        -n ${K6_NAMESPACE}
    
    - echo "Collecting test results from all Job instances..."
    - mkdir -p results
    
    - |
      # Collect logs from all Job pods
      kubectl logs \
        -l job-name=load-test-${CI_PIPELINE_ID} \
        -n ${K6_NAMESPACE} \
        --all-containers=true \
        --prefix=true \
        > results/k6-full-output.log
    
    - |
      # Extract summary from each pod
      for pod in $(kubectl get pods -l job-name=load-test-${CI_PIPELINE_ID} -n ${K6_NAMESPACE} -o name); do
        echo "=== Results from $pod ===" >> results/k6-summary.log
        kubectl logs $pod -n ${K6_NAMESPACE} | grep -A 50 "execution:" >> results/k6-summary.log || true
      done
    
    - echo "Test execution complete. Results collected."
    
  after_script:
    # Note: Job will auto-cleanup via ttlSecondsAfterFinished (1 hour)
    # ConfigMap cleanup manual for immediate cleanup
    - kubectl delete configmap test-script-${CI_PIPELINE_ID} -n ${K6_NAMESPACE} || true
    
  artifacts:
    when: always
    paths:
      - results/
    expire_in: 30 days
  environment:
    name: test-cluster
    url: https://api.${TARGET_ENVIRONMENT}.example.com
  timeout: 35m  # Slightly longer than Job wait timeout
 
# Generate and publish reports
report:
  stage: report
  image: python:3.11-slim
  before_script:
    - pip install -q k6-report-generator
  script:
    - echo "Generating performance reports..."
    - ./scripts/generate-report.sh results/k6-output.log
    
    - |
      # Parse summary for GitLab performance widget
      python -c "
      import json
      import sys
      
      # Parse k6 summary and convert to GitLab format
      with open('results/summary.json', 'r') as f:
          data = json.load(f)
      
      gitlab_perf = {
          'metrics': [
              {'name': 'http_req_duration_p95', 'value': data['metrics']['http_req_duration']['p(95)']},
              {'name': 'http_req_duration_p99', 'value': data['metrics']['http_req_duration']['p(99)']},
              {'name': 'http_req_failed_rate', 'value': data['metrics']['http_req_failed']['rate']},
              {'name': 'http_reqs_total', 'value': data['metrics']['http_reqs']['count']},
              {'name': 'vus_max', 'value': data['metrics']['vus_max']['value']},
          ]
      }
      
      with open('performance.json', 'w') as f:
          json.dump(gitlab_perf, f, indent=2)
      "
    
    - echo "Performance summary:"
    - cat performance.json
  artifacts:
    when: always
    reports:
      performance: performance.json
    paths:
      - results/report.html
      - results/summary.json
      - performance.json
    expire_in: 30 days
  dependencies:
    - execute

Test Script Template

Template (templates/test-template.js.tpl):

import http from 'k6/http';
import { check, group, sleep } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';
import { SharedArray } from 'k6/data';
 
// Custom metrics
const errorRate = new Rate('errors');
const apiDuration = new Trend('api_duration');
const apiCalls = new Counter('api_calls');
 
// Load API catalog
const apis = new SharedArray('apis', function() {
  return JSON.parse(open('./api-catalog.json'));
});
 
// Configuration from environment
const BASE_URL = __ENV.TARGET_BASE_URL || 'https://api.sandbox-sut-1.example.com';
const VUS = parseInt(__ENV.VIRTUAL_USERS || '10');
const DURATION = __ENV.TEST_DURATION || '1m';
const RAMP_UP = __ENV.RAMP_UP_TIME || '30s';
const PROFILE = __ENV.TEST_PROFILE || 'load';
 
// Load profile configurations
const profiles = {
  smoke: {
    stages: [
      { duration: '1m', target: 5 },
    ],
    thresholds: {
      'http_req_duration': ['p(95)<1000'],
      'http_req_failed': ['rate<0.05'],
    },
  },
  load: {
    stages: [
      { duration: RAMP_UP, target: VUS * 0.5 },
      { duration: DURATION, target: VUS },
      { duration: '30s', target: 0 },
    ],
    thresholds: {
      'http_req_duration': ['p(95)<500', 'p(99)<1000'],
      'http_req_failed': ['rate<0.01'],
    },
  },
  stress: {
    stages: [
      { duration: '2m', target: VUS },
      { duration: '5m', target: VUS * 2 },
      { duration: '2m', target: VUS * 3 },
      { duration: '5m', target: VUS },
      { duration: '2m', target: 0 },
    ],
    thresholds: {
      'http_req_duration': ['p(95)<1000', 'p(99)<2000'],
      'http_req_failed': ['rate<0.05'],
    },
  },
  spike: {
    stages: [
      { duration: '1m', target: VUS },
      { duration: '10s', target: VUS * 5 },  // Spike
      { duration: '1m', target: VUS },
      { duration: '10s', target: VUS * 5 },  // Second spike
      { duration: '1m', target: 0 },
    ],
    thresholds: {
      'http_req_duration': ['p(95)<1500', 'p(99)<3000'],
      'http_req_failed': ['rate<0.10'],
    },
  },
};
 
// Apply selected profile
export const options = {
  ...profiles[PROFILE],
  tags: {
    test_suite: '${TEST_SUITE}',
    environment: '${TARGET_ENVIRONMENT}',
    pipeline_id: '${CI_PIPELINE_ID}',
  },
  noConnectionReuse: false,
  userAgent: 'k6-load-test/${CI_PIPELINE_ID}',
};
 
// Setup function (runs once per VU)
export function setup() {
  console.log(`Starting ${PROFILE} test with ${VUS} VUs for ${DURATION}`);
  console.log(`Target: ${BASE_URL}`);
  console.log(`APIs under test: ${apis.length}`);
  
  return {
    apis: apis,
    baseUrl: BASE_URL,
  };
}
 
// Main test function
export default function(data) {
  const api = data.apis[Math.floor(Math.random() * data.apis.length)];
  
  group(`API: ${api.name}`, () => {
    const url = `${data.baseUrl}${api.path}`;
    const params = {
      headers: {
        'Content-Type': 'application/json',
        'X-Test-Pipeline': '${CI_PIPELINE_ID}',
        ...(api.headers || {}),
      },
      tags: {
        api_name: api.name,
        api_path: api.path,
      },
      timeout: api.timeout_ms || '30s',
    };
    
    const response = http.get(url, params);
    
    // Record metrics
    apiCalls.add(1);
    apiDuration.add(response.timings.duration, { api: api.name });
    
    // Validate response
    const checkResults = check(response, {
      'status is 200': (r) => r.status === 200,
      'response time OK': (r) => r.timings.duration < (api.slo_ms || 500),
      'has valid body': (r) => r.body && r.body.length > 0,
      'no errors in response': (r) => !r.json('error'),
    });
    
    errorRate.add(!checkResults);
    
    // Log failures
    if (!checkResults) {
      console.error(`API ${api.name} failed: status=${response.status}, duration=${response.timings.duration}ms`);
    }
  });
  
  // Think time
  sleep(Math.random() * 2 + 1);
}
 
// Teardown function
export function teardown(data) {
  console.log('Test completed');
}

Test Profile Configurations

File: config/test-profiles.yaml

profiles:
  smoke:
    description: "Quick sanity check with minimal load"
    virtualUsers: 5
    duration: 1m
    rampUp: 10s
    thresholds:
      p95: 1000ms
      p99: 2000ms
      errorRate: 5%
    
  load:
    description: "Sustained load test at expected traffic levels"
    virtualUsers: 100
    duration: 5m
    rampUp: 30s
    thresholds:
      p95: 500ms
      p99: 1000ms
      errorRate: 1%
    
  stress:
    description: "Push beyond normal load to find breaking point"
    virtualUsers: 200
    duration: 10m
    rampUp: 2m
    stages:
      - duration: 2m
        target: 100
      - duration: 5m
        target: 200
      - duration: 2m
        target: 300
      - duration: 1m
        target: 0
    thresholds:
      p95: 1000ms
      p99: 2000ms
      errorRate: 5%
    
  spike:
    description: "Sudden traffic spikes to test auto-scaling"
    virtualUsers: 150
    duration: 5m
    stages:
      - duration: 1m
        target: 50
      - duration: 10s
        target: 500  # Spike
      - duration: 1m
        target: 50
      - duration: 10s
        target: 500  # Second spike
      - duration: 1m
        target: 0
    thresholds:
      p95: 1500ms
      p99: 3000ms
      errorRate: 10%
    
  soak:
    description: "Extended duration test for stability and memory leaks"
    virtualUsers: 50
    duration: 2h
    rampUp: 5m
    thresholds:
      p95: 500ms
      p99: 1000ms
      errorRate: 1%

Monitoring Integration

InfluxDB Export Configuration:

apiVersion: k6.io/v1alpha1
kind: TestRun
spec:
  script:
    configMap:
      name: test-script
  arguments: |
    --out influxdb=http://influxdb.monitoring:8086/k6
    --tag testrun=${CI_PIPELINE_ID}
    --tag suite=${TEST_SUITE}
    --tag environment=${TARGET_ENVIRONMENT}
    --tag branch=${CI_COMMIT_BRANCH}
  runner:
    env:
      - name: K6_INFLUXDB_INSECURE
        value: "false"
      - name: K6_INFLUXDB_USERNAME
        valueFrom:
          secretKeyRef:
            name: influxdb-credentials
            key: username
      - name: K6_INFLUXDB_PASSWORD
        valueFrom:
          secretKeyRef:
            name: influxdb-credentials
            key: password

Grafana Dashboard JSON (excerpt):

{
  "dashboard": {
    "title": "k6 Load Test Dashboard",
    "panels": [
      {
        "title": "HTTP Request Duration (p95/p99)",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT percentile(\"value\", 95) FROM \"http_req_duration\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          },
          {
            "query": "SELECT percentile(\"value\", 99) FROM \"http_req_duration\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          }
        ]
      },
      {
        "title": "Requests Per Second",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT derivative(mean(\"value\"), 1s) FROM \"http_reqs\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          }
        ]
      },
      {
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT mean(\"value\") FROM \"http_req_failed\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          }
        ]
      },
      {
        "title": "Virtual Users",
        "type": "graph",
        "targets": [
          {
            "query": "SELECT max(\"value\") FROM \"vus\" WHERE \"testrun\"='$testrun' GROUP BY time(10s)"
          }
        ]
      }
    ],
    "templating": {
      "list": [
        {
          "name": "testrun",
          "type": "query",
          "query": "SHOW TAG VALUES WITH KEY = \"testrun\"",
          "current": {
            "text": "auto",
            "value": "$__auto_interval_testrun"
          }
        }
      ]
    }
  }
}

Alternative Approaches Considered

Note: See the “Operator vs Non-Operator Deployment Comparison” section above for a comprehensive decision matrix comparing all execution approaches.

Alternative 1: Simple Docker-Based Execution

Approach: Run k6 directly in GitLab runner containers without K8s operator

Pros:

Simpler initial setup (no operator required)
Faster to implement (1-2 hours vs 1-2 days)
Lower operational overhead (no operator maintenance)
Easy local testing and debugging

Cons:

Limited scaling (~10k RPS per runner, CPU bound)
Less resource isolation (shared runner resources)
No distributed load generation (manual orchestration required)
Harder to implement network policies (runner-level only)

Decision: Use this as Phase 0 quick-start, then migrate to operator for scale

Rationale: Provides immediate value for validation while full operator infrastructure is being established. See decision matrix above for detailed comparison.

Alternative 2: Locust (Python-based)

Approach: Use Locust for Python-native load testing

Pros:

Python-friendly (good for teams with Python expertise)
Web UI for monitoring
Distributed mode available

Cons:

Less Kubernetes-native
Heavier resource footprint
Less modern metrics/observability
Smaller community compared to k6

Decision: Rejected in favor of k6’s better K8s integration

Alternative 3: Managed Service (k6 Cloud, Grafana Cloud)

Approach: Use commercial k6 Cloud service

Pros:

Zero infrastructure management
Excellent reporting and analytics
Global load generation locations

Cons:

Cost per test run
External dependency
Data egress concerns (API catalog, secrets)
Less control over execution environment

Decision: Rejected for initial implementation; revisit for global load testing needs

Alternative 4: On-Demand REST API Wrapper

Approach: Build REST API service that wraps k6 execution

Pros:

More user-friendly than GitLab UI
Custom UI possibilities
Better programmatic integration

Cons:

Additional service to maintain
Reinvents GitLab’s workflow orchestration
Requires authentication/authorization implementation

Decision: Defer to Phase 5 if self-service adoption is insufficient

Success Metrics

Adoption Metrics

Target: 80% of teams use load testing before production deployments
Measure: GitLab pipeline executions, unique user count

Performance Metrics

Test Execution Time: <10 minutes for standard load tests
Test Setup Time: <5 minutes from trigger to execution start
Resource Utilization: <50% of sandbox-test cluster capacity

Quality Metrics

Test Reliability: >95% successful test runs (not counting legitimate failures)
False Positive Rate: <5% of test failures are infrastructure-related

Efficiency Metrics

Time to Create New Test: <30 minutes for catalog-based tests
Test Maintenance Burden: <2 hours/week team-wide

Risk Assessment

Technical Risks

Risk	Impact	Probability	Mitigation
K8s Job failures	Medium	Low	Standard pattern, use `backoffLimit: 0`, log all failures
Test cluster resource exhaustion	High	Medium	Strict resource quotas, Job TTL cleanup, monitoring
Network bottleneck (test cluster → SUT)	Medium	Low	Use separate cluster, monitor bandwidth, tune parallelism
Network policy misconfiguration	High	Low	Thorough testing, clear documentation, dry-run validation
Test generation failures	Medium	Medium	Validation stage, dry-run mode, schema validation
Metric collection failures	Medium	Low	Multiple collection methods (logs + InfluxDB), retry logic
Job coordination errors (distributed tests)	Medium	Low	Test coordination logic thoroughly, use JOB_COMPLETION_INDEX

Operational Risks

Risk	Impact	Probability	Mitigation
Accidental production testing	Critical	Low	Network policies, namespace restrictions, clear naming
Test maintenance burden	Medium	High	Catalog-driven generation, reusable components
Low adoption	Medium	Medium	Good documentation, training, easy onboarding
Cost overrun (compute resources)	Medium	Low	Resource quotas, time limits, monitoring

Security Risks

Risk	Impact	Probability	Mitigation
Credential exposure in tests	High	Medium	GitLab secrets, vault integration, no hardcoded secrets
Unauthorized access to SUT	High	Low	GitLab RBAC, K8s RBAC, audit logging
DDoS-like impact on SUT	Medium	Medium	Rate limiting, circuit breakers, clear communication

Open Questions

InfluxDB: Do we have an existing InfluxDB instance, or do we need to deploy one?
- Action: Check with platform team
API Catalog Integration: What format is the API catalog in? REST API, config file, service mesh?
- Action: Review API catalog documentation
Authentication: How should tests authenticate to federated APIs? OAuth2, API keys, mTLS?
- Action: Align with security team on test account strategy
Scheduled Tests: Should we run nightly regression tests? Which APIs?
- Action: Define with product team
SLO Definitions: Do we have formal SLOs for federated APIs?
- Action: Work with API producers to define/document
Cross-Sandbox Communication: Are there existing network policies between sandbox environments?
- Action: Review with network team
Cost Allocation: Should we track and charge back load testing costs per team?
- Action: Discuss with finance/platform teams

References

Documentation

Internal Resources

.ai/steering/argocd-development-workflow.md - ArgoCD patterns
.ai/steering/docker-image-workflow.md - Container build patterns
.ai/steering/testing-standards.md - Testing guidelines
API Catalog documentation (TBD)
Sandbox environment inventory (TBD)

Example Projects

Next Steps

Immediate (This Week):
- Review and approve this decision record
- Answer open questions
- Assign owner for implementation
Short Term (Next Sprint):
- Create implementation project (BMAD or Codev format)
- Set up development environment
- Begin Phase 1 implementation
Medium Term (Next Month):
- Complete Phase 1 foundation
- Conduct pilot with 2-3 teams
- Gather feedback and iterate
Long Term (Next Quarter):
- Complete all phases
- Full team rollout
- Integration with CI/CD pipeline standards

Approval

Proposed By: Platform Engineering Team
Date: 2026-02-04

Reviewers:

Platform Architecture Lead
API Federation Team Lead
Security Team
SRE Team

Status: Awaiting Review

Last Updated: 2026-02-04
Version: 1.0

Techcle Wiki

Explorer

Decision Record

Decision Record: Ad-Hoc Load Testing Framework

Context

Current State

Requirements

Decision

Tool Selection: k6 over Gatling

Operator vs Non-Operator Deployment Comparison

Priority Concerns: Decision Matrix

Reporting Deep Dive: k6 vs Gatling

Gatling Reporting (Out-of-the-Box Winner)

k6 Reporting (Trivial with Existing Grafana)

Recommendation Based on Your Priorities

Decision Table: With Existing Prometheus/Grafana 🎯

Other Concerns: Supporting Dimensions

Detailed Comparison

k6 with K8s Job ⭐ RECOMMENDED

k6 with Operator

k6 with Docker (GitLab Runner)

Gatling with K8s Job

Gatling with Operator

Gatling with Docker (GitLab Runner)

Recommended Decision Path

Our Decision: k6 + K8s Job

Architecture Overview

Key Architectural Decisions

1. Test Generation Location: GitLab CI

2. Test Execution: K8s Jobs on Separate Cluster

3. Test Storage: Hybrid Model

4. Self-Service Pattern: GitLab CI Variables

5. Network Isolation

Implementation Plan

Phase 1: Foundation (1-3 days)

Phase 2: Self-Service & Generation (1-2 weeks)

Phase 3: Enhanced Observability & Alerting (3-5 days)

Phase 4: Advanced Features (2-3 weeks)

Technical Specifications

Repository Structure

K8s Job Configuration

GitLab CI Pipeline

Test Script Template

Test Profile Configurations

Monitoring Integration

Alternative Approaches Considered

Alternative 1: Simple Docker-Based Execution

Alternative 2: Locust (Python-based)

Alternative 3: Managed Service (k6 Cloud, Grafana Cloud)

Alternative 4: On-Demand REST API Wrapper

Success Metrics

Adoption Metrics

Performance Metrics

Quality Metrics

Efficiency Metrics

Risk Assessment

Technical Risks

Operational Risks

Security Risks

Open Questions

References

Documentation

Internal Resources

Example Projects

Next Steps

Approval

Graph View

Table of Contents

Backlinks