Should We Support Webhook Callbacks in OAS 3.1?
Date: 2026-02-03
Status: Decision Required
Primary Question: Should HIP support OAS 3.1 webhooks for async API callbacks?
Executive Summary
Question: Should HIP support OAS 3.1 webhooks for async API callbacks?
Answer: HOLD pending validation of three critical blockers.
Webhooks provide genuine value for simple 1:1 async notifications (enterprise pattern 5a), but implementation is complex and depends on factors outside platform team control. Do not proceed without validating all three blockers.
What Are Webhooks?
Definition: OAS 3.1 native specification for async callbacks the API sends TO the consumer (API → Consumer direction).
Key Distinction: Webhooks are NOT about consuming data from consumers. They’re about the API calling back to the consumer when async processing completes.
Business Value:
- Eliminates polling overhead for consumers
- Enables real-time event processing
- Enterprise async request/callback pattern (5a)
Webhooks vs EDA: When to Use Each
| Aspect | Webhooks (OAS 3.1) | EDA (Kafka/EventBridge) |
|---|---|---|
| Callback registration | Consumer provides URL during API subscription (out-of-band) | Consumer subscribes to topics/event types in event catalog |
| Delivery model | Point-to-point HTTP callbacks (1:1, platform → consumer) | Pub/sub (1:many, multiple consumers can subscribe) |
| Consumer implementation | Expose HTTPS endpoint with HMAC verification | Connect to event bus (requires credentials, network access) |
| Security | HMAC-SHA256 signature verification | mTLS + Kafka ACLs or IAM policies |
| Failure handling | Platform retries with exponential backoff, DLQ after N attempts | Consumer controls offset, can replay from event history |
| Discovery | Webhooks defined in API spec | Events defined in event catalog |
| Best for | 1:1 relationships, simple callbacks, no consumer infrastructure | 1:many, complex event-driven workflows, event replay, multiple consumers |
HIP Strategy: Use both patterns based on use case:
- Webhooks: Simple API-level callbacks (e.g., “your submission is validated”)
- EDA: Complex domain events (e.g., “trader-registered”) where multiple consumers need the same event
Critical Blockers (MUST Validate Before Investment)
Blocker 1: HOD Backend Event Publishing Capability
What We Need: HOD backend services must be able to publish internal events to an event bus (Kafka/EventBridge) when async processing completes.
Why It’s Critical: Webhooks require an event bus - the webhook dispatcher service consumes events from the bus and delivers them via HTTP. Without event publishing from HOD backends, webhooks are not viable.
Validation Required:
- Can current HOD backends publish to the event bus?
- What changes would be needed to add event publishing?
- Timeline: When could this capability be added if not present?
- Are there existing events already published that could be reused?
Owner: HOD teams (each domain API team lead)
Status: ❓ Unknown
Blocker Level: 🔴 CRITICAL - If HOD teams cannot/will not publish to bus, webhooks are not viable
Blocker 2: Consumer Demand Validation
What We Need: Validate that >30% of consumers would actually use webhook callbacks vs polling.
Why It’s Critical: Webhooks require significant platform investment. If consumer demand is low, the ROI doesn’t justify the effort.
Validation Required:
- Survey: “Would you use webhook callbacks instead of polling?”
- Survey: “Can you expose public HTTPS endpoints for webhooks?”
- Survey: “What async operations would benefit most from push notifications?”
- Analysis: Calculate % interested at >30% threshold
Owner: Product/API Management (consumer survey)
Status: ❓ Unknown
Blocker Level: 🟡 SIGNIFICANT - If interest <30%, defer webhooks
Blocker 3: Platform Team Capacity
What We Need: Confirm platform team has capacity for 6-8 weeks initial build plus ongoing operational overhead.
Why It’s Critical: Webhooks require a new webhook dispatcher service. If platform team doesn’t have capacity, project will slip or be deprioritized.
Validation Required:
- Do we have 6-8 weeks available for MVP webhook dispatcher build?
- What’s the ongoing operational overhead (monitoring, scaling, maintenance)?
- Are there competing higher-priority initiatives?
Owner: Platform Engineering Lead
Status: ❓ Unknown
Blocker Level: 🟡 SIGNIFICANT - If capacity unavailable, defer webhooks
Decision Framework
IF All Three Blockers Are CLEARED ✅
Proceed to implementation with:
- Phase 1: Foundation (weeks 1-4) - infrastructure setup
- Phase 2: Webhooks (weeks 4-12) - dispatcher service build
- Phase 3: Growth (weeks 12+) - adoption and monitoring
IF Any Single Blocker FAILS ❌
Defer webhooks and:
- Document learnings and assumptions
- Revisit in next planning cycle
- Continue with EDA as primary async pattern
Architectural Context
Webhooks are complementary to EDA, not competitive:
- Webhooks enable HTTP callbacks for simple 1:1 notifications
- EDA provides pub/sub for multi-consumer domain events
- Both can coexist in HIP platform
Key insight: Webhooks ARE an EDA pattern (HTTP delivery mechanism). The webhook dispatcher is simply another EDA consumer that happens to deliver events via HTTP callbacks instead of the bus.
Comparison with Alternatives
Alternative 1: Consumer Callback API Pattern
Current Workaround: Consumers could publish their own callback API to HIP catalog. HOD backend calls consumer’s API directly.
Problems:
- Consumer becomes API producer (governance overhead)
- HOD directly coupled to consumer API lifecycle
- No standardization (each API defines callbacks differently)
- No HMRC/HIP ownership of delivery reliability
Verdict: Adequate workaround but not scalable long-term
Alternative 2: Event-Driven Architecture Only
Strategy: Use EDA exclusively for all async patterns (no webhooks).
Pros:
- Single async pattern
- More powerful (pub/sub, replay, exactly-once)
Cons:
- Requires consumer to connect to event bus (network access, credentials)
- Higher barrier to entry for simple use cases
- Not suitable for external consumers outside HMRC network
Verdict: EDA is better for internal services, webhooks are better for external consumers
Recommendations
Q1 2026: Validation Phase (BEFORE any investment)
Action 1: Survey Consumer Community
- Deploy survey to API consumers
- Questions: Would you use webhooks? Can you expose endpoints? What use cases?
- Target: >30% interest threshold
- Timeline: 2 weeks
Action 2: Engage HOD Teams (CRITICAL PRIORITY)
- Schedule working sessions with each HOD team lead
- Key question: “Can your backends publish to the event bus?”
- Understand: What’s required to add event publishing capability?
- Document: Current state vs. required state
- Timeline: 2-3 weeks
Action 3: Assess Platform Capacity
- Platform Engineering: Estimate effort for webhook dispatcher MVP (6-8 weeks?)
- Roadmap: Is capacity available in next planning cycle?
- Scope: What’s included in MVP vs. Phase 2?
- Timeline: 1 week
Decision Point (End of Q1 2026)
GO/NO-GO Criteria:
- ✅ HOD teams confirm event bus publishing capability OR clear path to add it
- ✅ Consumer survey shows >30% interest
- ✅ Platform team confirms 6-8 week capacity available
IF all three: PROCEED → Begin Phase 1 (Foundation)
IF any fail: HOLD → Document learnings, revisit next cycle
Key Insights
-
Webhooks are valuable for simple 1:1 async API callbacks, avoiding polling overhead.
-
Complexity is real - requires webhook dispatcher service, event bus integration, HMAC signing, retry logic, DLQ management.
-
Blockers are external - success depends on HOD team capabilities and consumer demand, not just platform team execution.
-
Validation is essential - do not invest in infrastructure without first validating that all three blockers will be cleared.
-
EDA is the foundation - webhooks only work if HOD backends are already publishing events to the bus. Webhook support amplifies existing EDA investment.
FAQ
Q: Why not just build webhooks anyway? A: High risk of building something consumers don’t want or HOD teams can’t support. Validate demand and feasibility first.
Q: What if HOD teams can’t publish to the bus? A: Webhooks aren’t viable without event publishing from HOD backends. Would need to add this capability as separate effort, delaying webhooks.
Q: Could we do webhooks without the event bus? A: Possible but not recommended - HOD backends would need to make direct HTTP calls to consumers, which creates tight coupling and operational risks.
Q: When will we know the answer? A: End of Q1 2026 after completing all three validation streams.
Related Documents
- OAS-3.1-SUPPORT.md - High-level OAS 3.1 support summary (references this analysis)