What To Do When a Third-Party API Breaks Your Production App

It's 2am. Your on-call phone rings. Users can't check out. The payment flow is returning errors. Your logs show a cascade of 500s — but nothing in your code changed.

A third-party API just broke your production app.

This happens to every team eventually. Stripe silently changes a webhook payload structure. GitHub deprecates an endpoint you've been calling for years. Twilio renames a field without a version bump. The API returns a valid HTTP 200, but the response body is different from what your code expects — and somewhere downstream, a critical flow is silently failing.

This guide covers what to do right now (during the incident), what to do next (to recover faster), and what to do long-term (so this doesn't blindside you again).


Step 1: Confirm It's Actually an API Breaking Change

The first 10 minutes of any incident are about narrowing hypothesis space. API breaking changes look a lot like other failures:

Signs it might be an API breaking change:

How to quickly confirm:

  1. Pull a raw API response from production logs and compare it against your integration's expected schema
  2. Check the third-party provider's status page and changelog
  3. Test the API call directly with curl or a tool like Postman — does the live response match your schema expectations?
# Quick schema comparison: what are you actually getting?
curl -s -H "Authorization: Bearer $API_KEY" \
  "https://api.thirdparty.com/v1/resource/123" | jq 'keys'

If the response keys don't match what your code expects, you've confirmed the breaking change.


Step 2: Contain the Blast Radius

While you're confirming the root cause, start containing damage in parallel:

Disable affected integrations if possible. If the broken API powers a non-critical feature (a notification enrichment, an optional analytics event), turn it off while you investigate. A graceful degradation is better than a full outage.

Enable fallback mode if you have it. Well-designed integrations have circuit breakers. If your circuit breaker trips on repeated failures, that's working as intended — let it. If you don't have a circuit breaker, this is a good moment to note why you need one.

Alert your support team. Customer-facing impact is already happening. Your support team should know what's broken and what the status is before users start filing tickets. A short Slack message ("Payment processing impacted, third-party API change detected, investigating") buys goodwill.


Step 3: Diagnose the Exact Schema Mismatch

Now slow down and diagnose the specific change. Third-party API breaking changes usually fall into a few categories:

Field removed or renamed. The most common cause. user.billing_plan became user.subscription_tier. Your code is looking for a key that no longer exists.

Type change. A field that was an integer is now a string. A boolean flag became an enum. A flat field became a nested object. Your parsing code succeeds but produces wrong values.

New required field in requests. You're sending an API request that's missing a newly required parameter. The API now returns a validation error where it used to return data.

Enum expansion. A status field that used to have active, inactive values now also has pending_verification. Your switch statement's default case silently swallows the new value.

Pagination or envelope structure change. The array you were reading from response.data is now at response.items. Everything appears to work until you hit a paginated result.

Once you've identified the specific mismatch, write it down. You'll need this for the postmortem.


Step 4: Write the Fix

With a diagnosed schema mismatch, the fix is usually straightforward:

  1. Update your integration code to handle the new schema
  2. Add backward compatibility if the old schema is still sometimes returned (during rollout windows)
  3. Test against the live API — not just mocks
  4. Add explicit tests that validate the new schema shape
// Before: brittle field access
const planName = response.user.billing_plan;

// After: handle both old and new schema during transition
const planName = response.user.subscription_tier ?? response.user.billing_plan;

If the change is a new required request field, check the provider's documentation for what the correct value should be and add it to your request payload.


Step 5: Deploy and Verify

Don't just deploy and hope. After pushing the fix:


Step 6: Write the Postmortem

Every third-party API breaking change deserves a brief postmortem. Not to assign blame — but to extract learning.

The most important question: how long did it take you to detect the breaking change?

If the answer is "users reported it" or "monitoring alerted after 30 minutes of errors," that's your baseline. The next question is: how do you reduce that detection time to under 5 minutes?


The Prevention Layer: Monitoring Third-Party APIs for Schema Drift

The fundamental problem with third-party API breaking changes is discovery lag. The API changed. You found out later — possibly much later.

The solution is active monitoring that detects schema drift the moment it happens.

Traditional uptime monitoring asks: "Is the API returning 200?" It won't catch a breaking change because the API still returns 200 — just with a different response body.

Schema drift monitoring asks: "Is the API returning the structure I expect?" It captures a baseline of exactly what the API returns — field names, types, nesting, enums — and alerts you the instant anything deviates.

With Rumbliq, you point a monitor at any third-party API endpoint:

GET https://api.stripe.com/v1/customers/{id}
Authorization: Bearer sk_live_...

Rumbliq captures the response schema baseline on first run. Every subsequent check validates that the response matches. If a field disappears, a type changes, or a nested structure shifts, you get an alert within minutes — not hours, not "when users report it."

This monitoring works for REST endpoints, webhooks, GraphQL schemas, and health check endpoints. For teams with dozens of third-party integrations, it catches the breaking changes you didn't even know to watch for.

Related reading:


FAQ

How do I know if a third-party API breaking change caused my outage?

Signs of a third-party API breaking change include: errors in response parsing code, unexpected null values or missing fields, 200 HTTP responses but incorrect data, and failures isolated to a specific integration path with no recent changes on your end. Confirm by comparing a raw API response against your expected schema, and checking the provider's status page and changelog.

How quickly should I respond to a third-party API breaking change?

Immediately. The moment you suspect a third-party API breaking change, start containing blast radius (disable or degrade affected features) while diagnosing in parallel. Customer-facing impact is already happening. Aim to have a fix deployed within 1–2 hours for critical path failures. Alert your support team immediately so they can communicate status to users.

Do third-party APIs have to notify you before making breaking changes?

There's no universal requirement, but most reputable API providers follow semantic versioning and provide deprecation periods. However, "non-breaking" changes — adding new fields, expanding enums, changing nested structure — often happen without notification because the provider considers them backward-compatible, even though they can break tightly-coupled integrations. Active monitoring is the only reliable way to catch these changes quickly.

What's the difference between an API outage and an API breaking change?

An API outage means the service is unavailable — requests fail with 5xx errors or timeouts. Standard uptime monitoring catches this. An API breaking change means the service is available and returning 200, but the response structure changed in a way that breaks your integration code. Uptime monitoring won't catch breaking changes — you need schema drift monitoring that validates response structure, not just availability.

How can I prevent third-party API breaking changes from affecting production?

The most effective prevention is schema drift monitoring: automated checks that capture your expected API response structure and alert you the moment it changes. This gives you hours or days to fix your integration before users are affected. Additionally: pin API versions when supported, add integration tests against live APIs, implement circuit breakers in your integration code, and build in graceful degradation for non-critical third-party features.


Rumbliq monitors your third-party API integrations for schema drift — detecting breaking changes before users do. Start monitoring free →