Webhook Monitoring Best Practices
Webhooks are elegant: instead of polling an API every few minutes, the API calls you when something happens. You get real-time updates without wasted requests.
The problem is that webhooks fail silently. When your endpoint is unreachable, when signature validation rejects a payload, when your handler crashes — events get dropped and nobody gets an alert. The webhook provider retried three times, got no acknowledgment, and moved on. Your application missed every event.
Why Webhook Monitoring Is Different
REST API monitoring is straightforward: poll an endpoint, check the response. Webhooks are inbound — you can't poll them. This inverts the monitoring problem:
- You don't initiate the request — the provider does, on their schedule
- Failures are invisible from the outside — your endpoint returning 500 looks the same as returning 200 to anyone not watching your server logs
- Events can be missed entirely — unlike polling, there's no retry you control
- Schema changes happen at the source — if a provider adds a required field to their webhook payload, your handler might crash before acknowledging receipt
This means webhook monitoring requires a different set of checks than REST API monitoring.
The Core Monitoring Layers
1. Endpoint Availability
Your webhook receiver endpoint needs to be reachable. Monitor it the same way you'd monitor any API endpoint — with regular uptime checks that verify it returns a valid response.
# Minimal webhook health check
curl -s -o /dev/null -w "%{http_code}" \
-X POST https://yourapp.com/webhooks/stripe \
-H "Content-Type: application/json" \
-d '{"type": "health_check"}'
If your provider doesn't send a health-check payload, set up your own uptime monitor to hit the endpoint with a synthetic request. The goal is to know immediately if the URL becomes unreachable.
2. Signature Verification
All major webhook providers (Stripe, GitHub, Shopify, Twilio) sign their payloads with a shared secret. Signature verification proves the request came from the actual provider.
import crypto from 'crypto';
function verifyStripeWebhook(
rawBody: string,
signature: string,
secret: string
): boolean {
const [, timestampStr, , v1Sig] = signature.split(',').map(p => p.split('='));
const timestamp = timestampStr;
// Prevent replay attacks
const fiveMinutesAgo = Math.floor(Date.now() / 1000) - 300;
if (parseInt(timestamp) < fiveMinutesAgo) {
throw new Error('Webhook timestamp too old — possible replay attack');
}
const signedPayload = `${timestamp}.${rawBody}`;
const expectedSig = crypto
.createHmac('sha256', secret)
.update(signedPayload)
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(v1Sig),
Buffer.from(expectedSig)
);
}
Monitor signature verification failures. A spike in signature failures means someone is sending malformed requests to your webhook endpoint — or your shared secret has become inconsistent.
3. Acknowledgment Tracking
Webhook providers expect a 2xx response within a timeout window (typically 5-30 seconds). If they don't get one, they retry.
Track acknowledgment latency in your logs:
app.post('/webhooks/stripe', async (req, res) => {
const startTime = Date.now();
try {
// Verify signature first
const sig = req.headers['stripe-signature'] as string;
verifyStripeWebhook(req.rawBody, sig, process.env.STRIPE_WEBHOOK_SECRET!);
// Acknowledge immediately — process asynchronously
res.json({ received: true });
// Enqueue for async processing
await queue.add('stripe-webhook', {
type: req.body.type,
data: req.body.data,
receivedAt: new Date().toISOString()
});
// Log acknowledgment latency
logger.info('webhook_acknowledged', {
provider: 'stripe',
eventType: req.body.type,
latencyMs: Date.now() - startTime
});
} catch (err) {
logger.error('webhook_failed', {
provider: 'stripe',
error: err.message,
latencyMs: Date.now() - startTime
});
// Still return 200 if it's a known bad payload — prevents infinite retries
res.status(400).json({ error: err.message });
}
});
Critical pattern: Acknowledge receipt immediately, then process asynchronously. If processing is slow or fails, you've already told the provider you got the event. Handle dead letter queues separately.
4. Payload Schema Monitoring
Webhook payloads change. Providers add fields, restructure nested objects, or silently change field formats across events. If your handler expects event.data.object.customer and the provider restructures to event.data.customer, your handler silently extracts undefined.
Monitor the schema of incoming webhook payloads:
import { z } from 'zod';
const StripePaymentIntentSchema = z.object({
id: z.string(),
type: z.string(),
created: z.number(),
data: z.object({
object: z.object({
id: z.string(),
amount: z.number(),
currency: z.string(),
customer: z.string().nullable(),
status: z.enum([
'requires_payment_method',
'requires_confirmation',
'requires_action',
'processing',
'requires_capture',
'canceled',
'succeeded'
]),
metadata: z.record(z.string()).optional()
})
})
});
app.post('/webhooks/stripe', async (req, res) => {
// ... signature verification ...
if (req.body.type === 'payment_intent.succeeded') {
const result = StripePaymentIntentSchema.safeParse(req.body);
if (!result.success) {
// Don't reject the webhook — acknowledge and alert
res.json({ received: true });
await alertOnSchemaChange('stripe', 'payment_intent', result.error);
return;
}
await processPaymentIntent(result.data);
res.json({ received: true });
}
});
Event Gap Detection
One of the hardest webhook monitoring problems: how do you know you missed an event?
If Stripe sends 100 payment events and your endpoint is down for 10 minutes, you might receive 70 of them after retries. The other 30 are gone. Your order count is wrong and nobody knows.
Strategies for detecting gaps:
Sequence numbers. Some providers include event sequence numbers or monotonically increasing IDs. Store the last-seen sequence and detect gaps.
Reconciliation polling. Pair webhook receipt with periodic polling of the source API. Every hour, fetch the last 100 events from Stripe and reconcile against your database. Any event you received via webhook but not in the poll means the poll endpoint is wrong. Any event in the poll but not in your database means you missed it.
async function reconcileStripeEvents(lookbackHours = 1) {
const since = Math.floor(Date.now() / 1000) - (lookbackHours * 3600);
const stripeEvents = await stripe.events.list({
created: { gte: since },
type: 'payment_intent.succeeded',
limit: 100
});
for (const event of stripeEvents.data) {
const exists = await db.webhookEvent.findUnique({
where: { stripeEventId: event.id }
});
if (!exists) {
logger.warn('missed_webhook_event', {
eventId: event.id,
type: event.type,
created: event.created
});
// Reprocess the missed event
await processStripeEvent(event);
}
}
}
Idempotency. Design your webhook handlers to be idempotent — processing the same event twice produces the same result. This makes reconciliation safe.
Monitoring Webhook Delivery with Rumbliq
Rumbliq approaches webhook monitoring from both sides:
Inbound webhook endpoint monitoring — set up an uptime check on your webhook receiver URL. Rumbliq polls it on your configured interval and alerts you if it becomes unreachable. This means you know about availability problems before your webhook provider's delivery failures pile up.
Outbound schema drift monitoring — if you consume a third-party API (like Stripe's event API) via polling for reconciliation, Rumbliq monitors that endpoint for schema changes. When the structure of the reconciliation endpoint changes, you're alerted immediately.
The combination catches the two failure modes that matter most: your endpoint being down, and the payload schema changing out from under your handlers.
Alerting on Webhook Problems
Configure alerts for these specific conditions:
| Condition | Alert Severity | Action |
|---|---|---|
| Webhook endpoint returns non-2xx | Critical | Page on-call immediately |
| Signature verification failure spike | High | Investigate — possible key rotation or attack |
| Acknowledgment latency > 10s | High | Handler is slow or blocking — check for deadlocks |
| Schema validation failures | High | Provider changed payload structure |
| Event gap detected in reconciliation | Medium | Check delivery logs, reprocess missed events |
| Retry queue depth growing | Medium | Processing is falling behind |
Quick Setup Checklist
- Webhook receiver URL is monitored with uptime checks (1-minute or 5-minute polling)
- Signature verification is implemented and failures are logged with counters
- Webhook handlers acknowledge immediately (< 2s) and process asynchronously
- Schema validation is in place for each event type you handle
- Reconciliation job runs regularly for critical event types
- Handlers are idempotent — safe to replay
- Dead letter queue captures failed processing jobs for investigation
FAQ
What are the best practices for monitoring webhooks?
Monitor endpoint availability with uptime checks. Validate signatures on every inbound webhook. Track inbound event volume and alert when it drops below baseline. Log every event with a unique ID and processing status. Monitor payload schemas for drift — when a provider changes their webhook structure, your handler may break silently. Implement idempotency and a dead letter queue so failed events can be retried.
How is webhook monitoring different from REST API monitoring?
REST API monitoring is outbound — you initiate requests and check responses. Webhook monitoring is inbound — you receive requests from providers and must verify they're arriving correctly. You can't poll a webhook; you have to observe inbound volume and alert when delivery stops. Failures are also invisible from the outside: your endpoint returning 500 looks the same as 200 to anyone not watching your server logs.
Should I use a queue for webhook processing?
Yes. Immediately acknowledge the webhook with a 200 response, then enqueue the event for async processing. This prevents delivery failures from slow processing, keeps your endpoint resilient under load, and provides a dead letter queue for events that fail processing.
Related Posts
- webhook monitoring guide
- webhook reliability for API integrations
- API alerting best practices
- detect webhook delivery failures before your customers do
- what to do when a third-party API breaks your production app
The Bottom Line
Webhooks are powerful but require active monitoring. The most dangerous failures — missed events, schema changes, silent processing errors — are invisible without instrumentation.
Start with the basics: monitor your webhook endpoint URL for availability, implement signature verification, and add schema validation for your most critical event types. Add reconciliation for any event type where a missed event has real business consequences.
Monitor your API endpoints and webhook receivers with Rumbliq →