AWS API Gateway - POC Engagement Brief

Platform: HIP (Enterprise Integration Platform) Date: 2026-03-09


What We’re Building

We are evaluating AWS API Gateway (expecting HTTP APIs) as the API gateway for our enterprise integration platform. The platform runs on AWS EKS (eu-west-2) and serves 20+ API producer teams across 10+ API categories.

We want to validate the approach through a proof of concept before committing to a full implementation.

Platform Overview

  • EKS-based platform, single region (eu-west-2), multi-AZ
  • 20+ API producer teams providing OpenAPI specifications
  • Centrally operated gateway - a platform team owns all routing, authentication, authorisation, and rate limiting configuration
  • Producer teams do not interact with Kubernetes or the gateway directly - they work through a platform abstraction layer
  • GitOps delivery via GitLab and Argo CD
  • Authentication handled by Keycloak (OIDC/OAuth2 and basic shared token flows)
  • Existing observability stack: Prometheus, Grafana, Jaeger
  • Critical National Infrastructure designation

What We Want to Achieve

Use AWS API Gateway (HTTP APIs) as the entry point for API traffic, managed from Kubernetes using AWS Controllers for Kubernetes (ACK) to preserve our GitOps workflow.

The target architecture:

  1. API Gateway resources defined as Kubernetes CRDs
  2. Argo CD applies CRDs to the cluster
  3. ACK controller reconciles CRDs into API Gateway resources
  4. Traffic flows via VPC Link to a Network Load Balancer, then to EKS services
  5. AZ affinity maintained (cross-zone load balancing disabled on NLB)

What We Need to Validate

1. Keycloak Integration

Our authentication is handled by Keycloak. We need to validate:

  • JWT authorizer integration with Keycloak-issued tokens (OIDC/OAuth2 flows)
  • Support for our basic shared token authentication pattern
  • Centralised auth configuration (platform-managed, not per-API)
  • Whether the JWT authorizer flexibility is sufficient for our auth requirements, or whether we need a Lambda authorizer
  • How we support OAuth2 client credentials flow as well as our current basic/bearer token flow which uses long-lived credentials

2. Rate Limiting

We recently built identity-aware, per-client rate limiting, although we’ve not yet onboarded the first API that needs it. We need to understand:

  • What throttling and quota capabilities are available on HTTP APIs
  • Whether per-client / per-principal rate limiting is achievable
  • How rate limiting behaves under load (predictability, failure modes)

3. ACK Controller for API Gateway

  • Maturity and reliability of the ACK API Gateway controller
  • CRD coverage - can we manage all required resources (APIs, routes, stages, authorizers, VPC Links) via CRDs?
  • Reconciliation behaviour - how does ACK handle drift, failures, and eventual consistency?
  • Do we get enterprise support for the ACK element?
  • Is Crossplane a better fit for our use case?

4. OpenAPI Integration

  • Importing OpenAPI specifications into API Gateway
  • Handling of AWS-specific extensions
  • Workflow for updating APIs from OAS changes (incremental updates vs full replacement)
  • We currently validate requests inside our microservices - could/should request validation be offloaded to API Gateway?

5. Observability

Our existing stack is Prometheus, Grafana, and Jaeger. We need to understand the best approach for:

  • Getting API Gateway metrics into Prometheus (CloudWatch exporter, or alternative)
  • Aggregating API Gateway logs (CloudWatch Logs) into our existing logging stack
  • Distributed tracing integration (X-Ray to Jaeger bridging, or alternative)

6. Networking and Performance

  • VPC Link configuration best practices for our scale
  • NLB configuration with AZ affinity (cross-zone load balancing disabled)
  • Expected latency overhead (API Gateway → VPC Link → NLB → EKS, compared to direct in-cluster routing)
  • Behaviour at scale - we estimate peak traffic of ~20k requests/sec

Traffic Profile

Estimated annual volume: ~84 billion requests, distributed as:

PeriodRequests/SecDuration
Night (9pm - 6am)1,000Daily
Peak20,000~20 days/year
Busy10,000~20 days/year
Steady state (daytime)5,000Remaining days

Peak traffic concentrates in a single calendar month.

NB this is expected target scale - we don’t expect to be there for many years & are expecting around 10-20% of this for our peak period in January 2027

Hard Limits We’re Aware Of

  • 10MB maximum payload size
  • 30-second maximum integration timeout

These align with our existing platform requirements. We’d like to confirm there are no other limits that could affect us at our traffic volumes.

What We’d Like From AWS

  • Guidance on the architecture pattern (API Gateway + ACK + VPC Link + NLB for EKS integration)
  • Review of our Keycloak/JWT authorizer integration approach
  • Advice on rate limiting capabilities and whether they meet our requirements
  • ACK vs Crossplane (or a different tech) recommendation for our use case
  • Observability integration best practices
  • Any relevant reference architectures or case studies at similar scale