Your product probably depends on 5–15 external APIs. One Friday, a vendor renames a field and flips a default. On Monday, your checkout, auth, or AI feature throws 500s—and your team goes spelunking through hand‑rolled client code. You didn’t sign up for this kind of excitement.
The market just told you what the professionals do. Anthropic acquired Stainless, the codegen platform used by OpenAI, Google, and Cloudflare to ship typed SDKs at speed (coverage). That’s not a vanity purchase; it’s a bet that contract‑first pipelines and generated clients are the only sane way to keep up with API churn—especially for LLMs where schemas, headers, and streaming semantics mutate monthly.
If you still hand‑roll clients, you’re signing up for recurring integration incidents and unbounded effort. This post gives you a decision framework and a concrete implementation plan to get to contract‑first, typed, regenerable SDKs that survive 2026‑level velocity.
What “contract‑first” actually means (and why it wins)
Contract‑first means the API’s formal schema is the source of truth. You pin the schema, generate SDKs, test against the contract, and ship changes only when diffs are green. It flips the usual posture: instead of discovering breakage in production, you detect it at codegen time or in CI.
In practice, that means you standardize on one of:
- OpenAPI/JSON Schema for REST. Mature ecosystem. Great for TypeScript, Go, Java, Python. Rich tooling like openapi‑generator, Speakeasy, Stainless, Kiota, and Fern.
- gRPC/protobuf for internal or high‑performance services. Strong typing, streaming built‑in, first‑class codegen.
- GraphQL with introspection and codegen (e.g., GraphQL Code Generator). Contracts are your SDL schemas.
For LLMs and evented APIs, your “contract” includes streaming shapes (SSE chunks, WebSocket messages), rate‑limit headers, and error taxonomies. If your schema ignores these, you’ll still end up writing bespoke glue and chasing heisenbugs.
When to go contract‑first: a decision framework
Not every integration needs a full platformization. Use this quick triage:
- High‑risk domain? Payments, auth, PII, healthcare, and AI inference: go contract‑first.
- More than 3 external APIs? The combinatorics of error handling alone justify codegen.
- Churn expected? Beta/rapidly evolving APIs (LLMs, search, data enrichment) demand pin‑and‑regen.
- Polyglot repos? If you support 2+ languages, hand‑rolling is a tax you’ll pay forever.
- SLAs matter? If an upstream change can break revenue or SLAs, treat schemas as dependencies you can lock and diff.
Rules of thumb we’ve seen hold:
- Codegen plus a contract pipeline cuts runtime integration errors by ~40–60% in the first quarter because you move checks to compile/test time.
- Incident MTTR for vendor changes drops from 1–3 days to 2–6 hours when you can regenerate and ship with confidence.
- Across a 10‑engineer team calling 6 external APIs, you save 2–4 engineer‑weeks/quarter in glue code and firefighting.
The minimal viable architecture (MVP) for contract‑first
1) Treat schemas as versioned, pinned dependencies
- Check in the schema (OpenAPI YAML/JSON, protobuf, GraphQL SDL) to your repo under /third_party/contracts/vendor@version, with a checksum.
- Automate diff detection: a nightly job fetches the vendor’s latest schema and opens a PR with a diff. “Breaking change” labels should block merges until adapters/tests are updated.
- Capture non‑spec behavior in an annotation file: rate‑limit headers, undocumented fields, retry hints. Keep reality in the repo.
2) Generate SDKs, don’t write them
- Pick a generator per language. For TypeScript, we like Stainless or Speakeasy for their pagination, retries, and streaming support. For Go/Java/Python, openapi‑generator is a solid default; Kiota is compelling if you want a single tool across languages.
- Centralize the HTTP runtime: timeouts, TLS settings, backoff, circuit breakers. Generators that let you inject a custom transport are gold.
- Make errors typed: map upstream error codes to enums; include raw payloads for forensics. No orphaned stringly‑typed errors.
3) Build a capability‑first facade
Wrap generated clients with a small, hand‑written adapter that expresses capabilities in your domain language: “createCheckout”, “embedModel.infer”, “vectorStore.query”. Adapters translate to vendor specifics.
- Why: you can swap or multi‑home providers without touching callers.
- How: define a narrow, stable interface per capability; keep vendor‑specific flags behind feature flags and presets.
4) Stream like a first‑class citizen
LLM APIs are streaming by default. Your contract must treat SSE/WebSocket chunks as typed events, not raw strings. Your generated client or transport should surface:
- Typed deltas with event types and sequence IDs.
- Budget signals: token counts, usage headers, and partial costs.
- Cancellation hooks: surface a cancel() that closes sockets and tears down resources deterministically.
5) Add a safety proxy where money flows
For costed APIs (LLMs, maps, telephony), put a lightweight proxy between your app and upstream that enforces budget caps, rate limits, and observability. Projects like LLMCap show the pattern: hard‑stop calls when you hit a dollar cap, not at month‑end finance reviews.
- Per‑tenant keys with allowlists; you can rotate without redeploying your app.
- Request journaling of redacted prompts/params for 7–14 days for break/fix.
- Backpressure: when upstream flaps, shed non‑critical load, not everything.
6) Contract tests > end‑to‑end only
- Schema conformance tests: use Dredd or Prism against mocks to validate you’re calling the contract correctly even before you hit the live API.
- Record/replay for integration: VCR‑style fixtures (e.g., Polly.js, Betamax) let you run most CI without vendor traffic.
- Canaries: a tiny prod‑only job that pings critical endpoints every 5–10 minutes with synthetic payloads and alerts on schema or auth drift.
7) Observability with correlation IDs and budgets
- Propagate x‑request‑id or equivalent across your facade and generated clients. Log every upstream call with the correlation ID, latency, and response code.
- Per‑provider SLOs and budgets in your APM. Treat “time spent waiting on Vendor X” as a first‑class metric.
- Structured errors in logs with vendor, endpoint, error‑enum, and whether a retry was attempted.
Tooling choices that won’t age like milk
Here’s how we evaluate generators and runtimes for longevity:
- Language coverage: TypeScript and Python are non‑negotiable for most startups; Go/Java/.NET if you’re polyglot. Kiota’s multi‑language promise is attractive if you invest once.
- Streaming support: Can it express SSE/WebSockets with typed events, backpressure, and cancellation? Many tools still punt here; verify before you commit.
- Pagination and retries: Built‑in, configurable, and visible in telemetry. You shouldn’t write “nextPageToken” loops by hand in 2026.
- Error typing: Does the generator produce structured exceptions with discriminated unions? If not, you’ll leak vendor weirdness into your domain.
- Custom transport injection: You want your own HTTP client to standardize timeouts, TLS, proxies, and circuit breakers.
- License and lock‑in: OSS vs commercial. Stainless and Speakeasy are excellent but paid; openapi‑generator is OSS but requires more curation.
Handling messy vendors: GraphQL, gRPC, and “OpenAPI‑ish” REST
Real life is heterogeneous. Your playbook needs escape hatches:
- GraphQL: Use introspection to generate typed clients. Enforce persisted queries where possible to lock contracts. For streaming (live queries, subscriptions), treat events as contracts and test them.
- gRPC: First‑class codegen; just pin protos. For public internet, wrap with a small REST facade if your stack isn’t gRPC‑native.
- Underdocumented REST: If a vendor ships Postman collections or “OpenAPI‑ish” specs, run them through a linter and fix locally. Keep a diff layer so you can rebase when they improve their docs.
Govern change like a dependency, not a rumor
Breaking change management needs muscle memory. Adopt these rules:
- Schema diffs are blocking in CI for critical integrations. “Breaking” labels require a risk owner and a rollout plan.
- Stability calendar: ask vendors for deprecation schedules. Pin a preferred version and test the next one weekly.
- 48‑hour SLO to ship compatibility fixes for critical APIs. With codegen, that’s realistic; without it, it’s fantasy.
- Shadow traffic to new versions where supported; compare error rates and latencies before you cut over.
Security and compliance, briefly (the pragmatic bits)
You’ve read the zero‑trust manifestos. Here’s what matters at the client layer:
- Per‑env credentials with tight scopes and rotation. Keep keys out of build logs and codegen outputs.
- mTLS or signed requests where offered, especially for PII/PHI flows.
- Redaction at the proxy for logs and request journals. Retain 7–14 days by default; longer requires an explicit legal/compliance ticket.
The point: contract‑first doesn’t absolve you of secrets hygiene, but it does make it possible to centralize and enforce it once.
Costs and ROI: make the business case
Budgeting guidance we use with CTOs:
- Initial setup: 3–5 engineer‑days to wire up codegen, transport, and CI for the first API and language; 1–2 days per additional language.
- Ongoing: 1–2 hours/week to triage schema diffs and regen clients across all vendors.
- Incident avoidance: If you average even one vendor‑change incident per quarter costing 2–3 engineer‑days, the system pays for itself inside a quarter.
- Speed: New endpoint adoption goes from 1–2 days of hand‑rolled glue to hours; typed SDKs and examples mean fewer integration tests need to be written from scratch.
Rollout plan: 30–60–90 days
Days 1–30: Prove it on one high‑risk API
- Pick your riskiest vendor (LLM, payments, auth). Pin the schema in‑repo and set up nightly diff PRs.
- Integrate a generator and central HTTP transport. Replace your hand‑rolled client in one service.
- Add a canary and basic contract tests with a mock server.
- Measure: compile‑time errors caught, CI time added, latency impact (should be negligible), and developer sentiment.
Days 31–60: Expand and standardize
- Extend to 2–3 more vendors. Build the capability‑first facade so product teams call “capabilities,” not vendors.
- Introduce the safety proxy for any costed API with budget caps and request journaling.
- Make schema diffs blocking for critical paths. Add a weekly “next‑version” test job.
Days 61–90: Industrialize
- Publish internal SDKs for your capabilities to all supported languages.
- Document error enums and retry policies. Wire structured logs with correlation IDs into your APM.
- Set the 48‑hour SLO for vendor change response. Run a game day where you break a contract and rehearse the regen/rollout.
Trade‑offs and failure modes (read this before you buy tools)
- Schema quality varies wildly. Some vendors publish perfect OpenAPI; others lie by omission. You’ll need to curate and patch in‑repo. Budget for it.
- Codegen churn can annoy engineers. Keep generated code isolated; never hand‑edit; enforce formatting to reduce noisy diffs. If PRs are unreadable, people will bypass the system.
- Not everything belongs in the facade. Exotic, vendor‑specific features can stay behind feature flags. Don’t force a least‑common‑denominator design that blocks innovation.
- Generators are not magic. You still own resilience: timeouts, retries, idempotency keys, backpressure. Verify that your transport does the right thing under load and failure.
Nearshore execution: why we keep getting asked to do this
This work is unglamorous and high‑leverage—the exact kind you want a disciplined nearshore partner to own. A senior Brazilian platform team can stand up the pipeline, write the facades, and keep your app teams unblocked, with 6–8 hours of US time‑zone overlap and typically 20–30% lower cost than hiring the same roles in‑market. More important than cost: you get a team that treats vendor change as a first‑class risk with an SLO, not as intermittent chaos.
The bottom line
Vendors will keep moving fast, and the AI ecosystem will keep shape‑shifting. You can keep hand‑rolling clients and absorbing the blast radius every time an upstream changes, or you can move the pain into a predictable pipeline and ship faster with fewer outages. Contract‑first is not a trend piece—it’s operational reality. The biggest LLM companies just told you so with their wallets.
Key Takeaways
- Adopt contract‑first now for high‑risk or fast‑moving APIs; pin schemas, generate SDKs, and block on breaking diffs.
- Wrap vendors behind capability‑first facades so you can swap or multi‑home providers without rewrites.
- Treat streaming as a contract: typed events, cancellation, and budget signals are table stakes for LLMs.
- Add a safety proxy for costed APIs to enforce dollar caps, rate limits, and short‑term request journaling.
- Expect 40–60% fewer runtime integration errors and 2–6 hour MTTR on vendor changes once the pipeline is in place.
- Start with one API in 30 days, expand in 60, industrialize by 90—with a 48‑hour SLO for compatibility fixes.