Webhook Monitoring: How to Know When Your Webhooks Stop Working
Webhooks are one of the most elegant patterns in API design. Instead of polling for updates, third-party services push events to your application as they happen. Stripe fires a payment_intent.succeeded event. GitHub fires a push event. Twilio fires a message.delivered event.
The problem: webhooks are invisible.
When an HTTP request arrives at your webhook endpoint, processes successfully, and triggers the right business logic, nothing remarkable happens. It just works. But when webhooks stop arriving — or arrive with a changed payload structure — your application silently loses events with no obvious alert.
This guide is about making webhooks visible: how to monitor delivery, validate payloads, and detect schema drift in webhook events.
Why Webhook Failures Are Hard to Detect
Standard monitoring doesn't cover webhooks well because of how webhooks work:
Webhooks are inbound requests you don't initiate. Most monitoring tools check whether your requests to external APIs succeed. Webhooks go the other direction — the external service calls you. If the external service stops calling you (or starts delivering malformed payloads), your existing monitoring won't catch it.
Webhook failures often look like silence. A dead webhook endpoint doesn't throw errors. Your application just doesn't receive events. Order confirmations don't process. Payment success hooks don't fire. Subscription cancellations aren't handled. Everything looks "up" in your monitoring dashboard.
Webhook payload schemas change. Third-party services update webhook payload structures when they add features, rename fields, or restructure events. Your handler code that worked last month breaks silently when the payload changes — with a 200 OK response (you successfully received the webhook; you just couldn't parse it correctly).
The Four Webhook Failure Modes
1. Delivery Failure
The webhook never arrives. Causes:
- Your endpoint is down
- Firewall rules block the request
- DNS misconfiguration
- The external service has a delivery outage
Detection: Your endpoint's inbound request rate drops to zero. This is the easiest failure to detect if you're monitoring your webhook endpoint.
2. Processing Failure
The webhook arrives but your handler fails. Causes:
- Code bugs triggered by edge-case payloads
- Database or downstream service unavailability
- Timeout in async processing
- Unhandled event types
Detection: Your webhook endpoint returns 5xx errors, or your application logs show exceptions. Most third-party services will retry on 5xx responses and log delivery failures in their dashboard.
3. Payload Schema Drift
The webhook arrives and your handler accepts it (returns 200), but the payload structure has changed. Your handler parses the payload using the old schema, misses fields, or crashes silently. Causes:
- The third-party service updated their webhook format
- A new event version was introduced (e.g., Stripe's
v2events) - Field names changed without a major version bump
Detection: This is the hardest to catch. Logging and schema drift monitoring catch it.
4. Signature Validation Failure
The webhook arrives but your HMAC signature validation rejects it. Causes:
- Webhook secret rotation without updating your configuration
- Clock skew in timestamp validation
- The third-party service changed its signing algorithm
Detection: Your handler returns 401/403, and the external service logs delivery failures.
Monitoring Your Webhook Endpoints
1. Track Inbound Request Volume
The most fundamental webhook metric: how many webhooks are you receiving, and at what rate?
For each webhook endpoint, track:
- Requests per minute/hour — establish a baseline, alert on significant drops
- Response code distribution — what % are 2xx vs 4xx vs 5xx?
- Response latency — webhook endpoints should respond quickly (< 500ms)
- Last received at — when was the last event received?
"Last received at" is particularly useful. If you're in a production system that typically receives Stripe webhooks every few minutes, and the last webhook arrived 3 hours ago, something is probably wrong.
2. Instrument Your Handler Code
Add structured logging to your webhook handlers:
app.post('/webhooks/stripe', async (c) => {
const start = Date.now();
const signature = c.req.header('stripe-signature');
try {
// Validate signature
const event = stripe.webhooks.constructEvent(body, signature, secret);
// Log receipt
logger.info({
type: 'webhook.received',
provider: 'stripe',
event_type: event.type,
event_id: event.id,
});
// Process event
await handleStripeEvent(event);
logger.info({
type: 'webhook.processed',
provider: 'stripe',
event_type: event.type,
event_id: event.id,
duration_ms: Date.now() - start,
});
return c.json({ received: true });
} catch (err) {
logger.error({
type: 'webhook.failed',
provider: 'stripe',
error: err.message,
duration_ms: Date.now() - start,
});
return c.json({ error: 'Webhook processing failed' }, 500);
}
});
With structured logging, you can:
- Query for failed webhook events
- Track processing latency by event type
- Detect missing event types (events that should arrive but haven't)
- Build dashboards on webhook health
3. Use the Provider's Delivery Dashboard
Most major webhook providers offer delivery monitoring dashboards:
- Stripe: Dashboard → Developers → Webhooks → View event deliveries
- GitHub: Repository Settings → Webhooks → Recent Deliveries
- Twilio: Console → Monitor → Logs → Debugger
- SendGrid: Mail Settings → Event Webhook → Test Your Integration
These dashboards show whether events were delivered and whether your endpoint returned success. Check them when investigating incidents. Set up provider-side alerts where available (Stripe offers email alerts for repeated webhook failures).
4. Set Up a Webhook Heartbeat
For critical business webhooks, implement a heartbeat pattern:
- Create a "test event" or use your provider's test mode to fire a known event
- Verify your handler processed it within the expected time window
- Alert if the test event doesn't show up in your processing log
Stripe, for example, lets you send test events directly from their dashboard. A daily automated test fire + processing verification gives you a webhook end-to-end health check.
Detecting Webhook Payload Schema Drift
Schema drift in webhook payloads is insidious because:
- Your handler accepts the webhook (200 OK)
- The external service thinks delivery succeeded
- Your code fails silently when trying to read a field that moved or was renamed
- Business logic doesn't execute
Approach 1: Log and Validate Every Payload
At your webhook endpoint, before parsing:
// Log the raw payload for debugging
logger.debug({
type: 'webhook.payload',
provider: 'stripe',
raw: JSON.stringify(body),
});
// Validate against your expected schema
const validation = validateStripeEventSchema(body);
if (!validation.valid) {
logger.warn({
type: 'webhook.schema_mismatch',
provider: 'stripe',
issues: validation.issues,
});
// Don't crash — try to handle gracefully
// but alert the team
}
Strict schema validation with structured logging gives you immediate visibility when a payload structure changes.
Approach 2: Monitor Your Webhook Endpoint with Schema Drift Detection
A webhook monitoring tool like Rumbliq can monitor your webhook endpoint by:
- Capturing the schema of payloads that arrive at your endpoint
- Comparing new payloads against the schema baseline
- Alerting when payload structure changes
This works for any HTTP endpoint that receives POST requests with JSON bodies.
Approach 3: Track Field Access Errors
In your handler code, use optional chaining and track when expected fields are absent:
const amount = event.data?.object?.amount;
if (amount === undefined) {
logger.warn({
type: 'webhook.missing_field',
provider: 'stripe',
event_type: event.type,
field: 'data.object.amount',
});
metrics.increment('webhook.field_missing', { field: 'amount' });
}
A sudden spike in webhook.field_missing metrics for a specific field is a strong signal that the provider changed their payload schema.
Monitoring Outbound Webhooks (Webhooks You Send)
If your application sends webhooks to customers (many SaaS products do), you have a different monitoring challenge: ensuring your webhooks are delivered reliably.
Track delivery success rates. For each webhook delivery attempt, record whether the recipient's endpoint returned 2xx. Track your overall delivery success rate.
Implement retry logic with exponential backoff. Recipient endpoints go down. Your webhook delivery system should retry failed deliveries:
Attempt 1: Immediately
Attempt 2: 5 seconds later
Attempt 3: 30 seconds later
Attempt 4: 5 minutes later
Attempt 5: 30 minutes later
Attempt 6: 2 hours later
...
Abandon after 72 hours
Maintain a delivery log. Every webhook delivery attempt — success or failure — should be logged with timestamp, target URL, response status, and payload hash. This is essential for debugging customer reports of "I didn't receive the event."
Monitor your delivery queue. If you process webhooks via a job queue (BullMQ, Sidekiq, etc.), monitor:
- Queue depth (how many are waiting?)
- Job age (how long has the oldest job been waiting?)
- Failed job rate
A growing queue often indicates a systemic delivery problem before end-users notice.
Implement webhook signatures. Sign your outbound webhooks with HMAC-SHA256. This lets recipients validate authenticity and you to prove delivery authenticity in disputes.
Webhook Monitoring Checklist
For inbound webhooks you receive:
- Structured logging on every webhook handler (received, processed, failed)
- Inbound request volume tracking — alert on significant drops
- "Last received at" monitoring for critical webhook types
- Provider delivery dashboard monitoring
- Schema validation on incoming payloads
- HMAC signature validation
- Error tracking for handler failures
- Periodic test event to validate end-to-end delivery
For outbound webhooks you send:
- Delivery attempt logging (success/failure/response code)
- Retry queue with exponential backoff
- Dead letter handling for permanently failed deliveries
- Queue depth monitoring
- HMAC signing on all outbound events
- Customer-facing delivery log (for debugging)
FAQ
How do you know if your webhooks have stopped working?
The most reliable signal is inbound request volume to your webhook endpoint dropping to zero or significantly below baseline. Set up logging on your webhook handler to record every received event, and alert when the rate drops unexpectedly. For provider-sent webhooks (Stripe, GitHub, Twilio), also monitor your endpoint's URL availability — if it returns errors, the provider will retry and eventually disable delivery.
What are the four main webhook failure modes?
- Delivery failure — the webhook never arrives (endpoint down, firewall, provider outage)
- Processing failure — the webhook arrives but your handler fails or times out
- Schema drift — the payload structure changes and your code breaks silently
- Signature validation failure — your HMAC check rejects legitimate events
How do you handle webhook failures gracefully?
Idempotent event handlers, a dead letter queue for events that exhaust retries, reconciliation jobs that re-sync state from the provider's API, and structured logging with event IDs. For high-value events (payment confirmations, subscription changes), implement reconciliation as a safety net regardless of webhook reliability.
Rumbliq and Webhook Monitoring
Rumbliq supports webhook endpoint monitoring as a native monitor type. You can:
- Monitor your webhook endpoint URL — Rumbliq sends test POST requests to verify the endpoint responds correctly
- Track response time — webhook endpoints should respond quickly; alerts if latency spikes
- Schema drift detection — capture the expected webhook payload structure and alert when it changes
For your most critical webhook integrations (payment processors, billing systems, CRM sync), Rumbliq provides a layer of automated monitoring that catches payload drift before it silently breaks your business logic.
Related Posts
- webhook monitoring best practices
- webhook reliability for API integrations
- API alerting best practices
- detect webhook delivery failures before your customers do
- what to do when a third-party API breaks your production app
Start monitoring your APIs free → — 25 monitors, 3 sequences, no credit card required.
Summary
Webhooks are powerful but invisible by default. Monitoring them requires deliberate instrumentation:
- Log every webhook — inbound receipt, processing success, processing failure
- Track delivery volume — baseline it, alert on drops
- Validate payload schemas — detect changes before they break business logic
- Monitor your webhook endpoints — uptime, response time, and payload drift
- For outbound webhooks — delivery logging, retry queues, queue depth monitoring
The combination of structured logging, provider dashboards, and schema drift monitoring covers the failure modes that most teams discover the hard way — through customer complaints about unprocessed events.