API Gateway Replacement Proposal
Platform: HIP (Enterprise Integration Platform) Date: 2026-03-09 Status: Proposal - subject to POC validation Author: APIM Platform Team
Executive Summary
We propose replacing Kong Enterprise API Gateway with AWS API Gateway (HTTP APIs), subject to a technical proof of concept.
Kong Enterprise costs £600k per year in licensing. The value we derive is concentrated in a small number of features: authentication/authorisation and rate limiting. The licence cost is fixed regardless of usage, and represents a significant operational expense relative to the value delivered.
Recommended option: AWS API Gateway at ~£54k/year recurring cost, saving ~£556k annually (91% reduction). Migration is estimated at £290-427k one-time cost, recovered within the first year.
Why AWS API Gateway over open-source alternatives: Enterprise support is a strong requirement for this platform, which carries a Critical National Infrastructure designation. AWS API Gateway is covered by existing AWS Enterprise Support at no incremental cost. Open-source alternatives (Istio, Envoy Gateway) would require separate enterprise support contracts at £75-385k/year, narrowing or eliminating the cost advantage.
What we’re asking for: Approval to proceed with a technical POC to validate Keycloak integration, rate limiting capabilities, and GitOps workflow before committing to migration.
| Option | Annual Cost | 3-Year TCO (inc. migration) | Savings vs Kong |
|---|---|---|---|
| Kong Enterprise (current) | £610-615k | £1,830-1,845k | - |
| AWS API Gateway | £54k | £451-589k | £1,241-1,394k (68-76%) |
| Istio + Solo.io support | £85-245k | £484-1,072k | £773-1,361k (42-74%) |
| Istio + Tetrate support | £164-400k | £721-1,537k | £308-1,124k (17-61%) |
| Istio/Envoy (self-supported) | £10-15k | £259-382k | £1,463-1,571k (80-85%) |
Self-supported open-source is the cheapest option but carries unacceptable operational risk for CNI workloads without SLA-backed vendor support.
1. Context
1.1 Current State
HIP runs on AWS EKS with Kong Enterprise as the API gateway. The platform serves 20+ API producer teams across 10+ API categories, using a centrally operated model:
- The APIM team owns the gateway runtime, routing, authentication, authorisation, and rate limiting
- Producer teams provide OpenAPI specifications and configuration only - they do not interact with Kong or Kubernetes directly
- A “paved road” abstraction layer (Platform APIs) sits between producers and the gateway, handling GitOps and ClickOps workflows
- Deployment is managed via GitLab + Argo CD
Kong runs on a dedicated EKS node group. Istio is already deployed in the platform. Authentication is handled by Keycloak (OIDC/OAuth2 and basic shared token).
1.2 Problem
Kong Enterprise costs approximately £600k per year in licensing, with an additional £10-15k in infrastructure costs, totalling ~£610-615k annually.
This cost is:
- Fixed regardless of traffic volume or feature adoption
- Concentrated in a small feature set (the Advanced Auth plugin and rate limiting)
- Disproportionate to the value derived
Secondary concerns include dependence on a proprietary control plane, governance risk from vendor-led roadmaps, and the opportunity to align with upstream Kubernetes networking standards.
1.3 Constraints
- Enterprise support required: The platform’s CNI designation creates a strong preference for SLA-backed vendor support with guaranteed incident response times and security patching
- Federated delivery model retained: Producers must continue to interact only through OAS specifications and Platform APIs - no direct gateway or Kubernetes access
- GitOps-first: All configuration must be declarative and managed through Argo CD
- No big-bang rewrite: Migration must be incremental, environment-by-environment
2. Options Considered
2.1 AWS API Gateway (HTTP APIs)
A fully managed AWS service, configured from Kubernetes via AWS Controllers for Kubernetes (ACK).
How it works: API Gateway runs outside the cluster as a managed AWS service. ACK controllers in the cluster translate Kubernetes CRDs into AWS API Gateway resources. Traffic reaches EKS services via VPC Links and a Network Load Balancer. Argo CD manages the CRDs, preserving the GitOps workflow.
Strengths:
- Fully managed - no gateway compute to operate
- Covered by existing AWS Enterprise Support at no incremental cost
- Usage-based pricing scales with actual traffic
- Strong SLAs and proven at scale
Weaknesses:
- Vendor lock-in to AWS (mitigated by existing AWS commitment across the platform)
- GitOps integration requires an indirection layer (ACK/Crossplane), adding complexity
- Keycloak integration via JWT authorizers is less flexible than in-cluster options
- Rate limiting is comparatively coarse (request throttling and quotas, not identity-aware)
- Hard limits: 10MB maximum payload, 30-second integration timeout (aligns with existing platform NFRs but becomes enforced at gateway level rather than policy)
- Additional network hops: API Gateway → VPC Link → NLB → EKS service
- Observability requires bridging CloudWatch to existing Prometheus/Grafana/Jaeger stack
2.2 Istio Gateway (with enterprise support)
Istio is already deployed in the platform. Using it as the API gateway with a managed control plane from Tetrate or Solo.io would provide enterprise support while staying Kubernetes-native.
Strengths:
- Already deployed and understood by the team
- Native integration with existing observability stack (Prometheus, Jaeger, Grafana)
- Strong OIDC/JWT integration with Keycloak
- Mature traffic policy (rate limiting, retries, timeouts)
- Kubernetes-native - no external control plane or VPC Links
- Portable across clouds and on-premises
Weaknesses:
- Enterprise support costs £75-385k/year depending on vendor, significantly narrowing the savings
- Gateway and mesh concerns can become intertwined if not carefully managed
- Support pricing is quote-based and varies widely by scale and tier
2.3 Istio/Envoy Gateway (self-supported)
Using open-source Istio or Envoy Gateway without vendor support.
Strengths:
- Lowest total cost (~£10-15k/year, infrastructure only)
- Maximum flexibility and portability
Weaknesses:
- No SLA-backed support for CNI workloads - unacceptable operational risk
- Relies entirely on internal expertise for incident response and security patching
2.4 Kubernetes Ingress (alternative controller)
Replacing Kong with another ingress-based solution (e.g., NGINX Ingress).
Strengths:
- Stable, well-understood technology
Weaknesses:
- Ingress API is feature-complete and frozen - no future development
- Auth and rate limiting encoded in controller-specific annotations (no portability gain)
- Reduces cost but does not materially improve long-term position
This option was evaluated and dismissed early - it improves cost but not architecture.
3. Recommendation
Proposed direction: Replace Kong Enterprise with AWS API Gateway (HTTP APIs), subject to successful POC.
Rationale:
-
Cost: ~£54k/year vs £610-615k/year. Even including migration costs (£290-427k), the investment is recovered within the first year. Over 3 years, total cost of ownership is £451-589k vs £1,830-1,845k for Kong.
-
Enterprise support: Covered by existing AWS Enterprise Support contract at no incremental cost. This is a significant differentiator - Istio enterprise support adds £75-385k/year, which substantially reduces savings.
-
Operational model preserved: The paved road abstraction means producers are unaffected. Platform APIs generate gateway configuration from OAS specifications regardless of whether the underlying gateway is Kong, AWS API Gateway, or Istio. Migration is invisible to producer teams.
-
Managed service: Eliminates gateway compute management. No node groups to size, no control plane upgrades, no data plane patching.
Trade-offs accepted:
- Vendor lock-in to AWS is accepted given the existing AWS infrastructure commitment
- Control plane externalisation is offset by reduced operational burden
- Observability integration with CloudWatch requires one-time bridging work (£43-66k)
- Rate limiting is less granular than Kong’s but adequate for current requirements
- Hard enforcement of 10MB/30s limits (currently soft policy, would become hard constraint)
If the enterprise support requirement is relaxed, Istio with community support (~£10-15k/year) becomes the most cost-effective option by a wide margin, but carries operational risk inconsistent with CNI workloads.
4. Cost Analysis
4.1 Traffic Model
Costs are modelled using a tiered traffic profile based on estimated peak volumes of 20k requests/sec. This is a strawman intended as a starting point - actual volumes should be validated with production data.
| Tier | Time Window | Requests/Sec | Days/Year | Annual Requests |
|---|---|---|---|---|
| Night | 9pm - 6am daily | 1,000 | 365 | 11.8B |
| Peak | 9am - 6pm, peak days | 20,000 | 20 | 13.0B |
| Busy | 9am - 6pm, busy days | 10,000 | 20 | 6.5B |
| Steady State | 9am - 6pm, normal days | 5,000 | 325 | 52.7B |
| Total | 83.9B |
Peak traffic concentrates in one calendar month (10 peak days + 10 busy days). The peak month generates ~12.3B requests at a cost of ~$9,980 (~£7,677), approximately 1.74x the average monthly cost. This variability is acceptable and significantly more favourable than flat licensing.
4.2 Annual Cost Comparison
| Solution | Gateway/License | Infrastructure | Support | Annual Total |
|---|---|---|---|---|
| Kong Enterprise (current) | £600,000 | £10-15k | Included | £610-615k |
| AWS API Gateway | £52,800 | £1,000* | Included in AWS | £53,800 |
| Istio (Tetrate support) | £0 | £10-15k | £154-385k | £164-400k |
| Istio (Solo.io support) | £0 | £10-15k | £75-230k | £85-245k |
| Istio/Envoy (self-supported) | £0 | £10-15k | £0 | £10-15k |
*VPC Link + NLB only; inter-AZ data transfer eliminated via AZ affinity (cross-zone load balancing disabled)
AWS Enterprise Support assumption: The £53,800 figure assumes HIP already has AWS Enterprise Support for EKS infrastructure. If not, add ~£138k/year, bringing the total to ~£192k and making Istio with Solo.io support potentially more competitive.
4.3 Enterprise Support Pricing Detail
Enterprise support costs for Kubernetes-native gateways are quote-based and vary significantly:
| Vendor | Product | Estimated Annual Cost |
|---|---|---|
| Tetrate | Service Bridge (Istio) | $200-500k USD (~£154-385k) |
| Solo.io | Gloo Mesh (Istio) | $100-300k USD (~£75-230k) |
| Red Hat | OpenShift Service Mesh | $50-75k USD/cluster |
| F5 | NGINX Enterprise Plus | $30-100k USD |
Red Hat is excluded from consideration as it requires migrating from EKS to OpenShift. Formal vendor engagement is needed for accurate quotes at our scale (84B requests/year, CNI workloads, 24/7 support).
4.4 Cost Sensitivity
Even with significant variance in assumptions, AWS API Gateway delivers substantial savings:
| Scenario | Annual Cost | Savings vs Kong |
|---|---|---|
| Base case | £54k | 91% |
| Request volume +50% | £80k | 87% |
| Request volume -50% | £27k | 96% |
| No AZ affinity (cross-zone LB enabled) | £57k | 91% |
5. Migration
5.1 Migration Strategy
Migration is environment-by-environment (not API-by-API) due to the platform architecture: wildcard DNS and a single load balancer point at Kong per environment. Each environment is migrated as a unit.
The paved road abstraction significantly reduces migration scope. Producers do not interact with Kong directly - they work through Platform APIs. Migration work is confined to the platform team and the Platform APIs layer.
5.2 Migration Cost Summary
| Solution | Direct Migration | Observability | Platform APIs | Total |
|---|---|---|---|---|
| AWS API Gateway | £213-307k | £43-66k | £34-54k | £290-427k |
| Istio (managed CP) | £188-272k | £7-11k | £34-54k | £229-337k |
AWS API Gateway migration costs are £61-90k higher than Istio, primarily due to observability integration (bridging CloudWatch to existing Prometheus/Grafana/Jaeger stack). Istio integrates natively with the existing observability infrastructure.
5.3 Three-Year Total Cost of Ownership
| Solution | Annual Recurring | Migration (one-time) | 3-Year TCO | Annualised |
|---|---|---|---|---|
| Kong Enterprise | £610-615k | £0 | £1,830-1,845k | £610-615k |
| AWS API Gateway | £54k | £290-427k | £451-589k | £150-196k |
| Istio (Solo.io) | £85-245k | £229-337k | £484-1,072k | £161-357k |
| Istio (Tetrate) | £164-400k | £229-337k | £721-1,537k | £240-512k |
Migration costs are recovered in less than one year for all options against Kong’s £610k annual cost.
5.4 What Changes, What Doesn’t
| Area | Impact |
|---|---|
| Producer teams | No change - paved road abstraction shields producers from gateway changes |
| Platform APIs | Internal updates to generate AWS API Gateway config instead of Kong config (£34-54k) |
| Observability | CloudWatch bridging to existing Prometheus/Grafana/Jaeger stack (£43-66k) |
| Authentication | Keycloak integration via JWT authorizers instead of Kong auth plugin |
| Team skills | AWS services and ACK expertise required (documentation and training included in migration costs) |
| Operational model | External control plane with multiple routing layers (API GW → VPC Link → NLB → EKS) |
6. Risks and Open Questions
6.1 Risks
| Risk | Severity | Mitigation |
|---|---|---|
| Keycloak/JWT authorizer integration proves inadequate | High | Validate in POC before committing |
| Rate limiting too coarse for requirements | Medium | Test throttling and quota capabilities in POC; evaluate whether current granularity is actually needed |
| ACK controller maturity or reliability issues | Medium | Evaluate in POC; Crossplane is an alternative |
| 10MB/30s hard limits block specific use cases | Low | Aligns with existing NFRs; survey current API catalogue for exceptions |
| AWS Enterprise Support assumption incorrect | Medium | Confirm current support tier before proceeding |
| Observability gap during migration | Low | Build CloudWatch bridge before migration; maintain Kong dashboards during parallel run |
6.2 Open Questions
- Confirm AWS Support tier: Is HIP already on AWS Enterprise Support? If not, the cost case changes significantly (+£138k/year)
- Validate traffic estimates: The 20k req/sec peak and 84B annual request figures are estimates - actual production data is needed
- Rate limiting requirements: Are current identity-aware rate limiting policies actually in use, or would coarser throttling/quotas suffice?
- Keycloak integration depth: What specific auth flows need to work through JWT authorizers? Basic token auth needs particular attention
7. Proposed Next Steps
Phase 1: Validation (4-6 weeks)
- Confirm AWS Enterprise Support tier
- Validate traffic estimates with production data
- Survey API catalogue for 10MB/30s limit exceptions
Phase 2: Technical POC (6-8 weeks)
- Deploy ACK API Gateway controller in non-production environment
- Configure VPC Link and NLB integration
- Test OAS → API Gateway configuration pipeline
- Validate:
- Keycloak integration via JWT authorizers
- Rate limiting capabilities (throttling and quotas)
- OpenAPI import and extensions compatibility
- Assess observability integration options (CloudWatch → Prometheus bridge)
Phase 3: Decision
- Go: Proceed with migration planning (environment-by-environment, starting with non-production)
- No-Go: Re-evaluate Istio with enterprise support, or challenge the enterprise support constraint
Appendix
A. Detailed Analysis
The full working analysis is available in initial-thoughts.md, including:
- Detailed traffic tier calculations and peak month billing impact
- AWS API Gateway architecture with ACK (configuration flow, limitations)
- Infrastructure cost component breakdown
- Detailed migration effort estimates by activity
- Observability and Platform APIs impact tables
- Sensitivity analysis across multiple scenarios
B. Key Assumptions
- Exchange rate: 1.30 GBP/USD
- AWS API Gateway HTTP APIs pricing as of 2026 (eu-west-2 London region)
- AZ affinity enabled (cross-zone load balancing disabled) to eliminate inter-AZ data transfer costs
- Single VPC Link and NLB
- No caching or WAF cost changes (WAF already exists)
- Enterprise support pricing based on industry estimates; formal quotes required
- Migration costs estimated at platform engineer day rates
C. Platform Architecture Reference
See .ai/projects/hip/SYSTEM-CONTEXT.md for full platform architecture, including EKS cluster topology, networking, and operational model.