Should We Support Webhook Callbacks in OAS 3.1?

Date: 2026-02-03
Status: Decision Required
Primary Question: Should HIP support OAS 3.1 webhooks for async API callbacks?


Executive Summary

Question: Should HIP support OAS 3.1 webhooks for async API callbacks?

Answer: HOLD pending validation of three critical blockers.

Webhooks provide genuine value for simple 1:1 async notifications (enterprise pattern 5a), but implementation is complex and depends on factors outside platform team control. Do not proceed without validating all three blockers.


What Are Webhooks?

Definition: OAS 3.1 native specification for async callbacks the API sends TO the consumer (API → Consumer direction).

Key Distinction: Webhooks are NOT about consuming data from consumers. They’re about the API calling back to the consumer when async processing completes.

Business Value:

  • Eliminates polling overhead for consumers
  • Enables real-time event processing
  • Enterprise async request/callback pattern (5a)

Webhooks vs EDA: When to Use Each

AspectWebhooks (OAS 3.1)EDA (Kafka/EventBridge)
Callback registrationConsumer provides URL during API subscription (out-of-band)Consumer subscribes to topics/event types in event catalog
Delivery modelPoint-to-point HTTP callbacks (1:1, platform → consumer)Pub/sub (1:many, multiple consumers can subscribe)
Consumer implementationExpose HTTPS endpoint with HMAC verificationConnect to event bus (requires credentials, network access)
SecurityHMAC-SHA256 signature verificationmTLS + Kafka ACLs or IAM policies
Failure handlingPlatform retries with exponential backoff, DLQ after N attemptsConsumer controls offset, can replay from event history
DiscoveryWebhooks defined in API specEvents defined in event catalog
Best for1:1 relationships, simple callbacks, no consumer infrastructure1:many, complex event-driven workflows, event replay, multiple consumers

HIP Strategy: Use both patterns based on use case:

  • Webhooks: Simple API-level callbacks (e.g., “your submission is validated”)
  • EDA: Complex domain events (e.g., “trader-registered”) where multiple consumers need the same event

Critical Blockers (MUST Validate Before Investment)

Blocker 1: HOD Backend Event Publishing Capability

What We Need: HOD backend services must be able to publish internal events to an event bus (Kafka/EventBridge) when async processing completes.

Why It’s Critical: Webhooks require an event bus - the webhook dispatcher service consumes events from the bus and delivers them via HTTP. Without event publishing from HOD backends, webhooks are not viable.

Validation Required:

  • Can current HOD backends publish to the event bus?
  • What changes would be needed to add event publishing?
  • Timeline: When could this capability be added if not present?
  • Are there existing events already published that could be reused?

Owner: HOD teams (each domain API team lead)

Status: ❓ Unknown

Blocker Level: 🔴 CRITICAL - If HOD teams cannot/will not publish to bus, webhooks are not viable


Blocker 2: Consumer Demand Validation

What We Need: Validate that >30% of consumers would actually use webhook callbacks vs polling.

Why It’s Critical: Webhooks require significant platform investment. If consumer demand is low, the ROI doesn’t justify the effort.

Validation Required:

  • Survey: “Would you use webhook callbacks instead of polling?”
  • Survey: “Can you expose public HTTPS endpoints for webhooks?”
  • Survey: “What async operations would benefit most from push notifications?”
  • Analysis: Calculate % interested at >30% threshold

Owner: Product/API Management (consumer survey)

Status: ❓ Unknown

Blocker Level: 🟡 SIGNIFICANT - If interest <30%, defer webhooks


Blocker 3: Platform Team Capacity

What We Need: Confirm platform team has capacity for 6-8 weeks initial build plus ongoing operational overhead.

Why It’s Critical: Webhooks require a new webhook dispatcher service. If platform team doesn’t have capacity, project will slip or be deprioritized.

Validation Required:

  • Do we have 6-8 weeks available for MVP webhook dispatcher build?
  • What’s the ongoing operational overhead (monitoring, scaling, maintenance)?
  • Are there competing higher-priority initiatives?

Owner: Platform Engineering Lead

Status: ❓ Unknown

Blocker Level: 🟡 SIGNIFICANT - If capacity unavailable, defer webhooks


Decision Framework

IF All Three Blockers Are CLEARED ✅

Proceed to implementation with:

  • Phase 1: Foundation (weeks 1-4) - infrastructure setup
  • Phase 2: Webhooks (weeks 4-12) - dispatcher service build
  • Phase 3: Growth (weeks 12+) - adoption and monitoring

IF Any Single Blocker FAILS ❌

Defer webhooks and:

  • Document learnings and assumptions
  • Revisit in next planning cycle
  • Continue with EDA as primary async pattern

Architectural Context

Webhooks are complementary to EDA, not competitive:

  • Webhooks enable HTTP callbacks for simple 1:1 notifications
  • EDA provides pub/sub for multi-consumer domain events
  • Both can coexist in HIP platform

Key insight: Webhooks ARE an EDA pattern (HTTP delivery mechanism). The webhook dispatcher is simply another EDA consumer that happens to deliver events via HTTP callbacks instead of the bus.


Comparison with Alternatives

Alternative 1: Consumer Callback API Pattern

Current Workaround: Consumers could publish their own callback API to HIP catalog. HOD backend calls consumer’s API directly.

Problems:

  • Consumer becomes API producer (governance overhead)
  • HOD directly coupled to consumer API lifecycle
  • No standardization (each API defines callbacks differently)
  • No HMRC/HIP ownership of delivery reliability

Verdict: Adequate workaround but not scalable long-term

Alternative 2: Event-Driven Architecture Only

Strategy: Use EDA exclusively for all async patterns (no webhooks).

Pros:

  • Single async pattern
  • More powerful (pub/sub, replay, exactly-once)

Cons:

  • Requires consumer to connect to event bus (network access, credentials)
  • Higher barrier to entry for simple use cases
  • Not suitable for external consumers outside HMRC network

Verdict: EDA is better for internal services, webhooks are better for external consumers


Recommendations

Q1 2026: Validation Phase (BEFORE any investment)

Action 1: Survey Consumer Community

  • Deploy survey to API consumers
  • Questions: Would you use webhooks? Can you expose endpoints? What use cases?
  • Target: >30% interest threshold
  • Timeline: 2 weeks

Action 2: Engage HOD Teams (CRITICAL PRIORITY)

  • Schedule working sessions with each HOD team lead
  • Key question: “Can your backends publish to the event bus?”
  • Understand: What’s required to add event publishing capability?
  • Document: Current state vs. required state
  • Timeline: 2-3 weeks

Action 3: Assess Platform Capacity

  • Platform Engineering: Estimate effort for webhook dispatcher MVP (6-8 weeks?)
  • Roadmap: Is capacity available in next planning cycle?
  • Scope: What’s included in MVP vs. Phase 2?
  • Timeline: 1 week

Decision Point (End of Q1 2026)

GO/NO-GO Criteria:

  • ✅ HOD teams confirm event bus publishing capability OR clear path to add it
  • ✅ Consumer survey shows >30% interest
  • ✅ Platform team confirms 6-8 week capacity available

IF all three: PROCEED → Begin Phase 1 (Foundation)
IF any fail: HOLD → Document learnings, revisit next cycle


Key Insights

  1. Webhooks are valuable for simple 1:1 async API callbacks, avoiding polling overhead.

  2. Complexity is real - requires webhook dispatcher service, event bus integration, HMAC signing, retry logic, DLQ management.

  3. Blockers are external - success depends on HOD team capabilities and consumer demand, not just platform team execution.

  4. Validation is essential - do not invest in infrastructure without first validating that all three blockers will be cleared.

  5. EDA is the foundation - webhooks only work if HOD backends are already publishing events to the bus. Webhook support amplifies existing EDA investment.


FAQ

Q: Why not just build webhooks anyway? A: High risk of building something consumers don’t want or HOD teams can’t support. Validate demand and feasibility first.

Q: What if HOD teams can’t publish to the bus? A: Webhooks aren’t viable without event publishing from HOD backends. Would need to add this capability as separate effort, delaying webhooks.

Q: Could we do webhooks without the event bus? A: Possible but not recommended - HOD backends would need to make direct HTTP calls to consumers, which creates tight coupling and operational risks.

Q: When will we know the answer? A: End of Q1 2026 after completing all three validation streams.