API Gateway Replacement Proposal

Platform: HIP (Enterprise Integration Platform) Date: 2026-03-09 Status: Proposal - subject to POC validation Author: APIM Platform Team


Executive Summary

We propose replacing Kong Enterprise API Gateway with AWS API Gateway (HTTP APIs), subject to a technical proof of concept.

Kong Enterprise costs £600k per year in licensing. The value we derive is concentrated in a small number of features: authentication/authorisation and rate limiting. The licence cost is fixed regardless of usage, and represents a significant operational expense relative to the value delivered.

Recommended option: AWS API Gateway at ~£54k/year recurring cost, saving ~£556k annually (91% reduction). Migration is estimated at £290-427k one-time cost, recovered within the first year.

Why AWS API Gateway over open-source alternatives: Enterprise support is a strong requirement for this platform, which carries a Critical National Infrastructure designation. AWS API Gateway is covered by existing AWS Enterprise Support at no incremental cost. Open-source alternatives (Istio, Envoy Gateway) would require separate enterprise support contracts at £75-385k/year, narrowing or eliminating the cost advantage.

What we’re asking for: Approval to proceed with a technical POC to validate Keycloak integration, rate limiting capabilities, and GitOps workflow before committing to migration.

OptionAnnual Cost3-Year TCO (inc. migration)Savings vs Kong
Kong Enterprise (current)£610-615k£1,830-1,845k-
AWS API Gateway£54k£451-589k£1,241-1,394k (68-76%)
Istio + Solo.io support£85-245k£484-1,072k£773-1,361k (42-74%)
Istio + Tetrate support£164-400k£721-1,537k£308-1,124k (17-61%)
Istio/Envoy (self-supported)£10-15k£259-382k£1,463-1,571k (80-85%)

Self-supported open-source is the cheapest option but carries unacceptable operational risk for CNI workloads without SLA-backed vendor support.


1. Context

1.1 Current State

HIP runs on AWS EKS with Kong Enterprise as the API gateway. The platform serves 20+ API producer teams across 10+ API categories, using a centrally operated model:

  • The APIM team owns the gateway runtime, routing, authentication, authorisation, and rate limiting
  • Producer teams provide OpenAPI specifications and configuration only - they do not interact with Kong or Kubernetes directly
  • A “paved road” abstraction layer (Platform APIs) sits between producers and the gateway, handling GitOps and ClickOps workflows
  • Deployment is managed via GitLab + Argo CD

Kong runs on a dedicated EKS node group. Istio is already deployed in the platform. Authentication is handled by Keycloak (OIDC/OAuth2 and basic shared token).

1.2 Problem

Kong Enterprise costs approximately £600k per year in licensing, with an additional £10-15k in infrastructure costs, totalling ~£610-615k annually.

This cost is:

  • Fixed regardless of traffic volume or feature adoption
  • Concentrated in a small feature set (the Advanced Auth plugin and rate limiting)
  • Disproportionate to the value derived

Secondary concerns include dependence on a proprietary control plane, governance risk from vendor-led roadmaps, and the opportunity to align with upstream Kubernetes networking standards.

1.3 Constraints

  • Enterprise support required: The platform’s CNI designation creates a strong preference for SLA-backed vendor support with guaranteed incident response times and security patching
  • Federated delivery model retained: Producers must continue to interact only through OAS specifications and Platform APIs - no direct gateway or Kubernetes access
  • GitOps-first: All configuration must be declarative and managed through Argo CD
  • No big-bang rewrite: Migration must be incremental, environment-by-environment

2. Options Considered

2.1 AWS API Gateway (HTTP APIs)

A fully managed AWS service, configured from Kubernetes via AWS Controllers for Kubernetes (ACK).

How it works: API Gateway runs outside the cluster as a managed AWS service. ACK controllers in the cluster translate Kubernetes CRDs into AWS API Gateway resources. Traffic reaches EKS services via VPC Links and a Network Load Balancer. Argo CD manages the CRDs, preserving the GitOps workflow.

Strengths:

  • Fully managed - no gateway compute to operate
  • Covered by existing AWS Enterprise Support at no incremental cost
  • Usage-based pricing scales with actual traffic
  • Strong SLAs and proven at scale

Weaknesses:

  • Vendor lock-in to AWS (mitigated by existing AWS commitment across the platform)
  • GitOps integration requires an indirection layer (ACK/Crossplane), adding complexity
  • Keycloak integration via JWT authorizers is less flexible than in-cluster options
  • Rate limiting is comparatively coarse (request throttling and quotas, not identity-aware)
  • Hard limits: 10MB maximum payload, 30-second integration timeout (aligns with existing platform NFRs but becomes enforced at gateway level rather than policy)
  • Additional network hops: API Gateway → VPC Link → NLB → EKS service
  • Observability requires bridging CloudWatch to existing Prometheus/Grafana/Jaeger stack

2.2 Istio Gateway (with enterprise support)

Istio is already deployed in the platform. Using it as the API gateway with a managed control plane from Tetrate or Solo.io would provide enterprise support while staying Kubernetes-native.

Strengths:

  • Already deployed and understood by the team
  • Native integration with existing observability stack (Prometheus, Jaeger, Grafana)
  • Strong OIDC/JWT integration with Keycloak
  • Mature traffic policy (rate limiting, retries, timeouts)
  • Kubernetes-native - no external control plane or VPC Links
  • Portable across clouds and on-premises

Weaknesses:

  • Enterprise support costs £75-385k/year depending on vendor, significantly narrowing the savings
  • Gateway and mesh concerns can become intertwined if not carefully managed
  • Support pricing is quote-based and varies widely by scale and tier

2.3 Istio/Envoy Gateway (self-supported)

Using open-source Istio or Envoy Gateway without vendor support.

Strengths:

  • Lowest total cost (~£10-15k/year, infrastructure only)
  • Maximum flexibility and portability

Weaknesses:

  • No SLA-backed support for CNI workloads - unacceptable operational risk
  • Relies entirely on internal expertise for incident response and security patching

2.4 Kubernetes Ingress (alternative controller)

Replacing Kong with another ingress-based solution (e.g., NGINX Ingress).

Strengths:

  • Stable, well-understood technology

Weaknesses:

  • Ingress API is feature-complete and frozen - no future development
  • Auth and rate limiting encoded in controller-specific annotations (no portability gain)
  • Reduces cost but does not materially improve long-term position

This option was evaluated and dismissed early - it improves cost but not architecture.


3. Recommendation

Proposed direction: Replace Kong Enterprise with AWS API Gateway (HTTP APIs), subject to successful POC.

Rationale:

  1. Cost: ~£54k/year vs £610-615k/year. Even including migration costs (£290-427k), the investment is recovered within the first year. Over 3 years, total cost of ownership is £451-589k vs £1,830-1,845k for Kong.

  2. Enterprise support: Covered by existing AWS Enterprise Support contract at no incremental cost. This is a significant differentiator - Istio enterprise support adds £75-385k/year, which substantially reduces savings.

  3. Operational model preserved: The paved road abstraction means producers are unaffected. Platform APIs generate gateway configuration from OAS specifications regardless of whether the underlying gateway is Kong, AWS API Gateway, or Istio. Migration is invisible to producer teams.

  4. Managed service: Eliminates gateway compute management. No node groups to size, no control plane upgrades, no data plane patching.

Trade-offs accepted:

  • Vendor lock-in to AWS is accepted given the existing AWS infrastructure commitment
  • Control plane externalisation is offset by reduced operational burden
  • Observability integration with CloudWatch requires one-time bridging work (£43-66k)
  • Rate limiting is less granular than Kong’s but adequate for current requirements
  • Hard enforcement of 10MB/30s limits (currently soft policy, would become hard constraint)

If the enterprise support requirement is relaxed, Istio with community support (~£10-15k/year) becomes the most cost-effective option by a wide margin, but carries operational risk inconsistent with CNI workloads.


4. Cost Analysis

4.1 Traffic Model

Costs are modelled using a tiered traffic profile based on estimated peak volumes of 20k requests/sec. This is a strawman intended as a starting point - actual volumes should be validated with production data.

TierTime WindowRequests/SecDays/YearAnnual Requests
Night9pm - 6am daily1,00036511.8B
Peak9am - 6pm, peak days20,0002013.0B
Busy9am - 6pm, busy days10,000206.5B
Steady State9am - 6pm, normal days5,00032552.7B
Total83.9B

Peak traffic concentrates in one calendar month (10 peak days + 10 busy days). The peak month generates ~12.3B requests at a cost of ~$9,980 (~£7,677), approximately 1.74x the average monthly cost. This variability is acceptable and significantly more favourable than flat licensing.

4.2 Annual Cost Comparison

SolutionGateway/LicenseInfrastructureSupportAnnual Total
Kong Enterprise (current)£600,000£10-15kIncluded£610-615k
AWS API Gateway£52,800£1,000*Included in AWS£53,800
Istio (Tetrate support)£0£10-15k£154-385k£164-400k
Istio (Solo.io support)£0£10-15k£75-230k£85-245k
Istio/Envoy (self-supported)£0£10-15k£0£10-15k

*VPC Link + NLB only; inter-AZ data transfer eliminated via AZ affinity (cross-zone load balancing disabled)

AWS Enterprise Support assumption: The £53,800 figure assumes HIP already has AWS Enterprise Support for EKS infrastructure. If not, add ~£138k/year, bringing the total to ~£192k and making Istio with Solo.io support potentially more competitive.

4.3 Enterprise Support Pricing Detail

Enterprise support costs for Kubernetes-native gateways are quote-based and vary significantly:

VendorProductEstimated Annual Cost
TetrateService Bridge (Istio)$200-500k USD (~£154-385k)
Solo.ioGloo Mesh (Istio)$100-300k USD (~£75-230k)
Red HatOpenShift Service Mesh$50-75k USD/cluster
F5NGINX Enterprise Plus$30-100k USD

Red Hat is excluded from consideration as it requires migrating from EKS to OpenShift. Formal vendor engagement is needed for accurate quotes at our scale (84B requests/year, CNI workloads, 24/7 support).

4.4 Cost Sensitivity

Even with significant variance in assumptions, AWS API Gateway delivers substantial savings:

ScenarioAnnual CostSavings vs Kong
Base case£54k91%
Request volume +50%£80k87%
Request volume -50%£27k96%
No AZ affinity (cross-zone LB enabled)£57k91%

5. Migration

5.1 Migration Strategy

Migration is environment-by-environment (not API-by-API) due to the platform architecture: wildcard DNS and a single load balancer point at Kong per environment. Each environment is migrated as a unit.

The paved road abstraction significantly reduces migration scope. Producers do not interact with Kong directly - they work through Platform APIs. Migration work is confined to the platform team and the Platform APIs layer.

5.2 Migration Cost Summary

SolutionDirect MigrationObservabilityPlatform APIsTotal
AWS API Gateway£213-307k£43-66k£34-54k£290-427k
Istio (managed CP)£188-272k£7-11k£34-54k£229-337k

AWS API Gateway migration costs are £61-90k higher than Istio, primarily due to observability integration (bridging CloudWatch to existing Prometheus/Grafana/Jaeger stack). Istio integrates natively with the existing observability infrastructure.

5.3 Three-Year Total Cost of Ownership

SolutionAnnual RecurringMigration (one-time)3-Year TCOAnnualised
Kong Enterprise£610-615k£0£1,830-1,845k£610-615k
AWS API Gateway£54k£290-427k£451-589k£150-196k
Istio (Solo.io)£85-245k£229-337k£484-1,072k£161-357k
Istio (Tetrate)£164-400k£229-337k£721-1,537k£240-512k

Migration costs are recovered in less than one year for all options against Kong’s £610k annual cost.

5.4 What Changes, What Doesn’t

AreaImpact
Producer teamsNo change - paved road abstraction shields producers from gateway changes
Platform APIsInternal updates to generate AWS API Gateway config instead of Kong config (£34-54k)
ObservabilityCloudWatch bridging to existing Prometheus/Grafana/Jaeger stack (£43-66k)
AuthenticationKeycloak integration via JWT authorizers instead of Kong auth plugin
Team skillsAWS services and ACK expertise required (documentation and training included in migration costs)
Operational modelExternal control plane with multiple routing layers (API GW → VPC Link → NLB → EKS)

6. Risks and Open Questions

6.1 Risks

RiskSeverityMitigation
Keycloak/JWT authorizer integration proves inadequateHighValidate in POC before committing
Rate limiting too coarse for requirementsMediumTest throttling and quota capabilities in POC; evaluate whether current granularity is actually needed
ACK controller maturity or reliability issuesMediumEvaluate in POC; Crossplane is an alternative
10MB/30s hard limits block specific use casesLowAligns with existing NFRs; survey current API catalogue for exceptions
AWS Enterprise Support assumption incorrectMediumConfirm current support tier before proceeding
Observability gap during migrationLowBuild CloudWatch bridge before migration; maintain Kong dashboards during parallel run

6.2 Open Questions

  1. Confirm AWS Support tier: Is HIP already on AWS Enterprise Support? If not, the cost case changes significantly (+£138k/year)
  2. Validate traffic estimates: The 20k req/sec peak and 84B annual request figures are estimates - actual production data is needed
  3. Rate limiting requirements: Are current identity-aware rate limiting policies actually in use, or would coarser throttling/quotas suffice?
  4. Keycloak integration depth: What specific auth flows need to work through JWT authorizers? Basic token auth needs particular attention

7. Proposed Next Steps

Phase 1: Validation (4-6 weeks)

  1. Confirm AWS Enterprise Support tier
  2. Validate traffic estimates with production data
  3. Survey API catalogue for 10MB/30s limit exceptions

Phase 2: Technical POC (6-8 weeks)

  1. Deploy ACK API Gateway controller in non-production environment
  2. Configure VPC Link and NLB integration
  3. Test OAS → API Gateway configuration pipeline
  4. Validate:
    • Keycloak integration via JWT authorizers
    • Rate limiting capabilities (throttling and quotas)
    • OpenAPI import and extensions compatibility
  5. Assess observability integration options (CloudWatch → Prometheus bridge)

Phase 3: Decision

  • Go: Proceed with migration planning (environment-by-environment, starting with non-production)
  • No-Go: Re-evaluate Istio with enterprise support, or challenge the enterprise support constraint

Appendix

A. Detailed Analysis

The full working analysis is available in initial-thoughts.md, including:

  • Detailed traffic tier calculations and peak month billing impact
  • AWS API Gateway architecture with ACK (configuration flow, limitations)
  • Infrastructure cost component breakdown
  • Detailed migration effort estimates by activity
  • Observability and Platform APIs impact tables
  • Sensitivity analysis across multiple scenarios

B. Key Assumptions

  • Exchange rate: 1.30 GBP/USD
  • AWS API Gateway HTTP APIs pricing as of 2026 (eu-west-2 London region)
  • AZ affinity enabled (cross-zone load balancing disabled) to eliminate inter-AZ data transfer costs
  • Single VPC Link and NLB
  • No caching or WAF cost changes (WAF already exists)
  • Enterprise support pricing based on industry estimates; formal quotes required
  • Migration costs estimated at platform engineer day rates

C. Platform Architecture Reference

See .ai/projects/hip/SYSTEM-CONTEXT.md for full platform architecture, including EKS cluster topology, networking, and operational model.