2026-06-06 · 12 min read

Should Your Workflows Live in Postgres? A CTO Decision Framework After pg_durable

By Diogo Hudson Dias

PostgreSQL server rack with a laptop showing SQL queries and job queue metrics in a modern data center

You do not need another distributed system if your database can do the job. But you also do not need your database doing everything. Microsoft’s decision to open source an in‑database durable execution engine for Postgres (pg_durable) puts that tension on your desk today. The pitch is seductive: fewer moving parts, transactional consistency, and SQL‑native observability. The risk is just as real: widened failure domains, write‑amplified hot tables, and DBA work masquerading as simplicity.

This post gives you a pragmatic decision framework. When should you move workflows into Postgres? When should you keep an external orchestrator like Temporal, Cadence, Conductor, Argo, or AWS Step Functions? And if you go in‑DB, how do you keep it from becoming your next incident narrative?

The problem pg_durable wants to solve

Durable workflows need three things:

State that survives process and machine restarts.
Timers and retries with backoff that won’t vanish during deploys.
Exactly‑once or effect‑idempotent execution tied to business writes.

Most teams hack this together with a queue, a job table, and some “best effort” idempotency keys. It works until it doesn’t. External orchestrators fix that but add a new distributed system to run, learn, and pay for. In‑database durable execution says: let Postgres hold the workflow state machine and timers, and let your workers pull steps with transactional semantics. You reduce network hops and attach business writes and workflow transitions in one commit.

First principles: the four axes that decide this

1) Workload shape

Profile what actually runs through your workflows:

Throughput: average and p95 workflows started per second; steps executed per second.
Step duration: CPU‑bound in your code (10–500 ms), I/O waits to third‑party APIs (100 ms–10 s), or human‑in‑the‑loop (minutes–days).
Fan‑out: one workflow → N parallel steps? How big can N get?
Payload size: step input/output persisted per transition (bytes vs KB vs MB).

Why it matters: Postgres is stunningly good at thousands of small transactions per second with predictable row sizes. It is less happy with high‑churn hot partitions, unbounded append‑only logs in the main cluster, or megabyte‑scale payloads in workflow history. As a sanity check, well‑tuned queue‑on‑Postgres systems (e.g., pg_boss, pgbmq, SKIP LOCKED patterns) commonly sustain 5–20k jobs/s on a single decent box with SSDs. You can get more with partitioning and aggressive VACUUM, but you are now a queue engineer. External orchestrators push past that without pressurizing your primary database.

2) Data proximity and transactional coupling

Do your workflow steps change rows in the same Postgres database that holds your domain truth? If yes, in‑DB orchestration is compelling. You can make the business write and the workflow transition happen in the same transaction. That removes outbox/relay complexity and closes a class of race windows. If most steps call external APIs or other data stores, the coupling value drops, and you’re loading Postgres with state that isn’t co‑located with the real work.

3) Failure domains and blast radius

Moving orchestration into Postgres widens the blast radius of a database incident. A write‑ahead log (WAL) surge from a mis‑configured retry storm can degrade the same cluster your product queries rely on. If your RTO/RPO posture already treats the primary as a crown jewel, do you want workflow storms living there? External orchestrators cost you a separate control plane that can fail independently—which is good when your database is on fire.

4) Platform constraints and team capacity

Reality check items:

Managed Postgres limitations: some providers gate C extensions. RDS and Aurora Postgres allow a subset. Check if pg_durable runs where you run Postgres.
Multi‑region: if you need active‑active across regions, orchestrators with their own replication model can be easier than multi‑writing workflow state across regions in Postgres.
Ops bench strength: running Temporal or Step Functions well is not free. Neither is becoming an expert in autovacuum, HOT updates, and partitioned job tables.

Where in‑database durable execution shines

1) Low‑latency, row‑local workflows

Think anti‑fraud checks on a checkout row, entitlement recomputation when a user changes plan, or materialized view refreshes. The execution path is “touch a few rows, emit an event, schedule a retry.” You want each step to be idempotent and commit with the same transaction as the row change. In‑DB wins on simplicity, consistency, and latency (one fewer network hop).

2) Tight cost control and smallish teams

One fewer horizontally scalable system to operate means lower cognitive and dollar costs. If your peak is sub‑5k step executions/s and payloads are small, Postgres is hard to beat on TCO. You pay for larger IOPS and storage headroom versus paying for an orchestrator cluster or per‑state‑transition charges in a cloud service.

3) Compliance and auditability by SQL

Step history and decisions live in tables you can join, snapshot, and export under your existing compliance regime. No separate audit lake to reconcile. SOC 2 and ISO 27001 reviews get easier when you can prove control with SQL queries.

Where external orchestrators still win

1) Wild fan‑out, long tails, human steps

Hundreds of parallel branches, steps that wait hours for callbacks, and human approvals push you into orchestration land. Temporal’s timer wheels and event sourcing are engineered for this. Pushing that shape into Postgres without starving the rest of your workload is an art you should not have to learn.

2) Team boundaries and service isolation

If you’re a multi‑team scale‑up, an external orchestrator becomes a platform with well‑defined APIs, quotas, and multi‑tenant fairness. Putting everybody’s workflow state into one database schema can create noisy‑neighbor incidents and political fights over VACUUM settings.

3) Multi‑region activeness and cloud permissions sprawl

Global SLAs and cross‑region workflows are easier to reason about when the orchestrator abstracts replication. And when your steps need cloud credentials for a dozen services, keeping secrets and IAM scoped to the orchestrator often beats shoving more responsibility into the DB boundary.

The hidden costs of in‑DB workflows

Write amplification and VACUUM budget

Every step transition is at least one write and often two (claim + complete) plus a timer write. At 2k steps/s with average 300‑byte rows, you’re generating on the order of 600 KB/s of table churn before WAL overhead. The WAL multiplier can easily 2–4x that depending on indexes. That is 1.2–2.4 MB/s of WAL, which translates to 100–200 GB/day. Plan IOPS and storage accordingly. Then plan VACUUM: if your autovacuum lag grows, dead tuples bloat, HOT updates degrade, and replicas start to lag.

Hot partitions and index design

Timers create a hotspot around “next_due_at.” You need partial indexes, bucketing (e.g., by minute), and partitioning for large deployments. That is engineering you otherwise would have paid to your orchestrator vendor with a credit card.

Logical replication and CDC side effects

High‑churn workflow tables can dominate your logical replication stream and drown your downstream consumers. Use publication filters to exclude workflow schemas from CDC unless required. If you cannot exclude them, consider a separate Postgres cluster for orchestration state.

Observability you must build

SQL makes ad hoc debugging delightful. But you still need service‑level metrics: step start/stop counts, retries, DLQ size, timer wheel lag, executor saturation. If the extension does not export these, you will build them. That work remains whether you are nearshore or in‑house.

A concrete decision framework

Score each statement 0–2. Sum your score.

80%+ of workflow steps read/write rows in the same Postgres cluster as your core product.
p95 step runtime is under 500 ms; 99th under 5 s; payloads under 10 KB.
Peak sustained transitions under 5k/s; fan‑out typically under 32 branches.
Single‑region or active‑passive is acceptable; RTO of minutes, not seconds.
Your Postgres provider allows required extensions; your team can tune autovacuum and partitioning.
Compliance benefits from SQL‑native audit of workflow history.

10–12: In‑DB durable execution is likely the right default. Keep it disciplined.
6–9: Mixed. Start in‑DB for row‑local flows; carve out long‑tail or fan‑out heavy flows to an external orchestrator.
0–5: Use an external orchestrator. Your workload or org shape will punish the database.

If you choose in‑DB: guardrails that prevent 3 a.m. drama

1) Separate schema, maybe a separate cluster

Keep workflow tables in their own schema with publication filters off by default. If your product’s read replicas or CDC consumers start lagging because of workflow churn, move workflow state to a second Postgres cluster before you move to a different technology. Two Postgres clusters are often still simpler than learning a whole new orchestrator.

2) Partition by time and tenant

Partition job and history tables by due_at bucket and optionally by tenant. Drop old partitions instead of DELETE. Keep the hot working set small enough that VACUUM completes well under your retry backoff intervals.

3) Design for idempotency and effect logs

Require a request_id for every external side effect. Store an effect log table keyed by (request_id, target). Make all workers check this before making a side effect. This is orchestrator‑agnostic hygiene that turns at‑least‑once delivery into exactly‑once effects.

4) Put timers on a budget

Do not let every team schedule arbitrary wakeups. Offer fixed backoff policies and capped retry counts per workflow type. Expose a “timer debt” metric: total overdue timers. Alert on growth. That metric will tell you about executor starvation faster than user reports.

5) Backpressure and fairness

Use SKIP LOCKED or advisory locks with per‑queue limits. Ensure high‑value queues cannot be starved by bulk low‑priority jobs. Set per‑tenant concurrency caps. Enforce them in SQL, not just in worker code.

6) WAL, IOPS, and VACUUM SLOs

Set SLOs you can measure: e.g., p95 VACUUM delay under 5 minutes for hot partitions; WAL generation under 3 MB/s sustained during peak; replica apply lag under 5 seconds for non‑workflow schemas. If you cannot measure these, you are flying blind.

7) Failure testing

Kill a worker during a step. Reboot the primary during a retry storm. Pause autovacuum. Measure the time to recovery and the volume of duplicate side‑effect attempts. If those drills are terrifying, your defaults are not safe yet.

If you stay with an external orchestrator: keep the wins

Co‑locate data writes: Even with Temporal or Step Functions, keep the outbox pattern near your domain tables. Make step completion contingent on the outbox write.
Use the orchestrator for long‑tail only: Consider a hybrid: in‑DB for short, row‑local workflows; external for human steps and big fan‑outs. Draw the line by latency and retention, not by team.
Cap bill shock: If you are on a per‑transition pricing model, move chatty retries into in‑process backoff before surfacing to the orchestrator.

What about agents and AI workflows?

Agentic systems make the worst possible workflow customers: long chains, speculative branches, retries on flaky tools, and megabyte prompts in step payloads. Resist the urge to put that state into your primary Postgres. If you must keep a thin control track in Postgres (e.g., for billing or audit), store only references and minimal metadata. The token streams, tool logs, and scratchpads belong in blob storage indexed by a dedicated orchestrator or a stream processor, not in your OLTP database.

Cost modeling: simple math, big implications

Here is a quick back‑of‑the‑envelope to compare options for a mid‑scale startup:

In‑DB: Peak 2k transitions/s, 1 KB per transition state/log, WAL factor ~3x → ~500 GB/month of storage churn, plus replicas. Assume a beefy Postgres instance at $2–4k/month and 2–3 replicas. Engineering: 0.2–0.4 FTE of a senior who knows Postgres internals.
External orchestrator (self‑hosted Temporal): 3–5 node cluster, $1–2k/month infra, 0.3–0.6 FTE platform engineer. Better isolation, more software to run.
External orchestrator (managed/PaaS): Often per‑transition pricing. At $0.25 per million state transitions (illustrative; check your vendor), 5B transitions/year is $1.25M. Many teams underestimate this line item until the bill arrives.

None of these numbers are “right” for you. But they frame the decision: are you paying in compute, expertise, or vendor margin?

Migrations without downtime

From DIY queues to in‑DB durable execution

Introduce the extension and new tables in a separate schema.
Dual‑write step claims to both old and new job tables for a canary subset.
Switch worker consumers per workflow type behind flags.
Harden VACUUM, timer lag, and idempotency checks; then flip the default.
Backfill or archive the old job history into cheap storage and drop the hot partitions.

From in‑DB to an external orchestrator (the escape hatch)

Add an adapter layer that translates your workflow state row into an orchestrator workflow input.
For new work, send externally but keep a Postgres pointer for audit.
For running work, export timers as orchestrator timers at safe handoff boundaries (after a step completes).
Keep in‑DB timers disabled but recoverable for 1–2 release cycles in case you need to roll back.

A note on nearshore execution

If you are a US CTO working with a Brazilian nearshore team, in‑DB durable execution can be a velocity multiplier when the team already owns Postgres. The overlap window (6–8 hours) is enough to iterate on schema and execution semantics quickly, and you avoid waiting on a new platform contract or SSO integration. The trade‑off is you must write down operational SLOs and runbooks early—otherwise the “it’s just SQL” mindset will leave you with invisible toil that surfaces at the worst time.

Bottom line

pg_durable puts in‑database durable execution within reach for teams that would never have stood up Temporal. That is good for shipping. It is dangerous for teams that confuse fewer containers with less complexity. Use your workload shape and failure model—not fashion—to decide.

Key Takeaways

In‑DB workflows shine for low‑latency, row‑local steps with tight transactional coupling. They reduce outbox/relay glue and one moving part.
External orchestrators win for wild fan‑out, long waits, multi‑region activeness, and multi‑team isolation.
Model cost: Postgres pays in WAL/IOPS and DBA time; orchestrators pay in clusters or per‑transition dollars.
If you go in‑DB, budget for VACUUM, partitioning, timer fairness, and WAL SLOs. Put timers and retries on a budget.
Keep a clean escape hatch: separate schema, dual‑write canaries, and clear handoff boundaries back to an external system if you outgrow Postgres.