GraphQL API Monitoring: Catching Schema Changes Before They Break Your App
GraphQL promised to solve the versioning problem. Instead of releasing v2 and v3 of a REST API, you evolve your schema incrementally — add fields, deprecate old ones, never break clients that don't ask for removed fields.
In practice, GraphQL APIs break clients all the time.
Field deprecations get removed before clients migrate. Type changes slip through schema reviews. A nullable field becomes non-nullable. A union type gains a new member that a client's exhaustive switch statement doesn't handle. The query that worked last week returns a resolver error today because an underlying data source changed.
REST monitoring tools weren't designed for any of this. This guide covers what GraphQL API monitoring actually requires — and how to build a monitoring strategy that catches real problems before users do.
How GraphQL Breaks Differently from REST
REST APIs break in ways most monitoring tools understand: HTTP 4xx and 5xx status codes, timeouts, missing endpoints. GraphQL breaks differently.
GraphQL almost always returns HTTP 200 — even for errors. This is by design. The GraphQL spec says: if the server understood the request and produced a response, return 200. Errors live inside the response body, in the errors array.
This means a request that completely failed looks like this over the wire:
HTTP/1.1 200 OK
Content-Type: application/json
{
"data": null,
"errors": [
{
"message": "Cannot query field 'email' on type 'User'.",
"locations": [{ "line": 3, "column": 5 }],
"path": ["user", "email"]
}
]
}
An uptime monitor that checks for HTTP 200 just reported this as healthy.
Beyond error handling, GraphQL has its own class of breaking changes that don't exist in REST:
- Removing a field from a type (clients querying it get null or an error)
- Changing a field's type (String → Int, or adding/removing non-null)
- Removing a type from a union or interface
- Renaming a type or enum value
- Changing argument requirements (adding a required argument to a previously optional field)
- Deprecation removal — the two-step that bites teams: deprecate, assume clients migrate, remove
None of these show up as HTTP errors. They show up as partial data, unexpected nulls, or application logic failures in clients.
The Four Layers of GraphQL Monitoring
1. Schema Change Detection
The most important thing you can monitor for a GraphQL API — especially one you don't control — is schema changes.
GraphQL's introspection API was designed for exactly this. A standard introspection query returns the complete schema:
query IntrospectionQuery {
__schema {
types {
name
kind
fields {
name
type {
name
kind
ofType {
name
kind
}
}
isDeprecated
deprecationReason
args {
name
type {
name
kind
ofType { name kind }
}
defaultValue
}
}
}
queryType { name }
mutationType { name }
subscriptionType { name }
}
}
Run this query periodically and diff the result against a stored baseline. Any added, removed, or changed field, type, or argument shows up immediately — before a deployment goes wrong, before a client breaks, before an on-call engineer gets paged.
The diff you want to produce looks something like:
BREAKING CHANGES:
- Field removed: User.email (was String!)
- Argument added (required): Query.users.filter (was optional)
- Type changed: Order.status (was String, now OrderStatus enum)
NON-BREAKING CHANGES:
- Field added: User.phoneNumber (String)
- Field deprecated: User.username (use User.handle instead)
- Type added: PhoneVerification
This is the kind of structured diff that lets you act before anything breaks.
2. Query-Level Functional Monitoring
Schema diffing tells you what changed. Query monitoring tells you whether real queries still work.
Pick a set of representative GraphQL queries that cover your critical paths — the queries your application actually runs. Execute them on a schedule against the real API and validate the responses.
For a social platform consuming a third-party user API:
# Critical path: fetch user profile
query GetUserProfile($userId: ID!) {
user(id: $userId) {
id
displayName
avatarUrl
bio
followersCount
isVerified
}
}
Checks to run on the response:
data.useris not nulldata.user.idmatches the requested IDdata.user.displayNameis a non-empty stringerrorsarray is absent or empty- Response time is under your SLA threshold
If any of these fail, something broke — whether it's a field removal, a resolver error, a permissions change, or a backend outage.
3. Error Rate Monitoring
Even when queries return partial data, GraphQL surfaces errors at the field level. Tracking error rates across operations gives you a leading indicator of degradation.
What to track:
- Resolver errors per operation — errors on
mutation.createOrderjumping from 0% to 12% - Partial data responses — responses where
datais present buterrorsis non-empty - Specific error codes —
UNAUTHENTICATED,FORBIDDEN,NOT_FOUNDspiking unexpectedly - Field-level null rates — a previously reliable field suddenly returning null on 40% of requests
Error rate monitoring is most powerful when you own the GraphQL server. For third-party GraphQL APIs, you're limited to what you can observe from synthetic queries.
4. Performance Monitoring
GraphQL queries can have wildly different complexity. A query that traverses three levels of nested relationships and returns 500 nodes is expensive. Performance degrades as data volumes grow, query complexity increases, or the API introduces new resolver overhead.
Monitor:
- P50, P95, P99 latency per operation (not just average — tail latency matters)
- Request timeout rate — queries that exceed your client timeout
- Query complexity scores if the API exposes them in extensions
A third-party GraphQL API that takes 200ms for your critical query on Monday and 1,800ms on Friday has degraded significantly, even though it's still returning HTTP 200 with valid data.
Monitoring Third-Party GraphQL APIs
When you're consuming a GraphQL API you don't control — a data provider, a platform API, a vendor's service — your monitoring options are more limited but schema detection becomes even more critical.
Check whether introspection is enabled. Many GraphQL APIs enable introspection in development and disable it in production for security reasons. If introspection is off, you can't get the schema programmatically.
For APIs with introspection disabled:
- Monitor at the query level — run your actual queries and validate responses
- Watch for field-level nulls that weren't null before
- Track the API's schema changelog or developer blog for announced changes
- Use Rumbliq to monitor the introspection endpoint when enabled, or monitor representative query responses for structural drift
For APIs with introspection enabled, you get the most powerful option: automated schema diffing. Rumbliq can monitor a GraphQL introspection endpoint and alert you the moment a field is removed, a type changes, or a deprecation is added — the same way it monitors REST API response schemas.
Setting Up GraphQL Schema Monitoring with Rumbliq
Rumbliq monitors any HTTP endpoint that returns JSON. GraphQL introspection fits naturally: POST the introspection query, get a JSON response, store the schema as a baseline, diff every subsequent response against it.
Here's how to set it up:
Step 1: Create a monitor for the introspection endpoint
Method: POST
URL: https://api.example.com/graphql
Headers:
Content-Type: application/json
Authorization: Bearer YOUR_TOKEN
Body:
{"query": "{ __schema { types { name kind fields { name type { name kind ofType { name kind } } isDeprecated deprecationReason } } } }"}
Step 2: Set the monitoring interval
For a third-party API you depend on heavily, check every 5 minutes. For lower-priority integrations, hourly is fine. Rumbliq's schema diffing runs on every check — you'll catch changes within one polling interval.
Step 3: Configure alert routing
Schema changes on a critical GraphQL dependency warrant immediate attention. Route to your primary on-call channel (Slack, PagerDuty webhook) and make sure the diff is included in the alert so the on-call engineer can immediately assess severity.
Step 4: Monitor your critical queries separately
In addition to introspection, add monitors for your two or three most critical operations. These catch runtime errors that schema diffing can't detect — resolver bugs, authorization failures, backend data issues.
Method: POST
URL: https://api.example.com/graphql
Body: {"query": "query { user(id: \"test-user-id\") { id displayName } }"}
Expected: data.user.id == "test-user-id" AND errors is empty
GraphQL-Specific Breaking Change Patterns to Watch
Based on common GraphQL API evolution patterns, here are the changes most likely to break clients:
1. Deprecation-then-removal without adequate notice
The pattern is: deprecate a field, announce migration in developer docs, wait 90 days, remove the field. The problem is clients don't always respond to deprecations. Schema monitoring that alerts when a deprecated field is removed (not just when it's deprecated) gives you the final warning.
2. Non-null constraint additions
A field that was String becomes String!. Any client that sends null for that field (perhaps when a user hasn't filled in a profile field) now gets a validation error. This is technically a breaking change even though the underlying data type didn't change.
3. Input type changes on mutations
Adding a required argument to a mutation is immediately breaking. Your createOrder mutation that worked yesterday fails today because shippingAddress is now required. Schema monitoring catches this before your checkout flow breaks.
4. Union and interface member changes
If a SearchResult union type previously contained [User, Post, Product] and a new member Event is added, exhaustive pattern matching in clients breaks. If Post is removed from the union, queries that request ... on Post { ... } silently return nothing.
5. Enum value additions
Adding a new value to an enum is technically non-breaking at the API level. But if your client code has an exhaustive switch on an enum and doesn't handle unknown values, a new enum value causes a runtime error. Worth alerting on so you can validate your client handles it.
Building a Runbook for GraphQL Schema Alerts
When Rumbliq fires a schema change alert on your GraphQL dependency, your team needs a clear playbook:
Immediate (0-5 minutes)
- Identify the nature of the change: breaking or additive?
- Check the vendor's changelog, status page, and developer Twitter/Discord for announcements
- Determine which of your application queries are affected
Short-term (5-30 minutes)
- Run affected queries in a staging environment against the updated API
- Check error logs for client-side failures that may have already started
- If breaking: decide whether to roll back a recent deployment or hot-patch the client query
Resolution
- Update client queries to use new field names, handle new types, or remove references to deleted fields
- Update baseline in Rumbliq once the change is intentional and handled
- Add a note to your internal changelog: "Third-party API changed field X on [date], handled by PR #N"
Why REST Monitoring Tools Miss GraphQL Problems
Standard uptime monitors check: did I get HTTP 200? Is the response non-empty?
This is necessary but not sufficient for GraphQL. The table is stark:
| Problem | Uptime Monitor | GraphQL Schema Monitor |
|---|---|---|
| API server is down | Catches it | Catches it |
| Field removed from response | Misses it (still 200) | Catches it |
| Field type changed | Misses it | Catches it |
| Resolver returning null unexpectedly | Misses it | Catches it (via query monitoring) |
| New required argument added | Misses it | Catches it |
| Deprecated field removed | Misses it | Catches it |
| Performance degradation | May catch with latency checks | Catches with latency tracking |
If you're monitoring a GraphQL API with an uptime checker, you have significant blind spots. Schema-aware monitoring — whether via Rumbliq or a purpose-built solution — closes those gaps.
Summary
GraphQL's flexibility is real, but so is its complexity from a monitoring perspective. The HTTP 200 problem means standard uptime tools give you a false sense of security. Real GraphQL monitoring requires:
- Schema change detection via introspection diffing — catches structural API changes before they reach clients
- Query-level functional monitoring — validates that real operations work end-to-end
- Error rate tracking — catches resolver failures and partial data problems
- Latency monitoring — surfaces performance degradation before it becomes user-visible
For third-party GraphQL APIs, Rumbliq's schema drift detection handles introspection monitoring and query-level checks with minimal configuration. Add a monitor for your most critical GraphQL dependency today — the next schema change is coming, and it's better to find out at 9am on a Tuesday than at 2am on a Saturday.
Related Posts
Start monitoring your APIs free → — 25 monitors, 3 sequences, no credit card required.