Durable Execution & Workflows
Programming model for long-running, stateful application logic: Vercel's Workflow DevKit ("use workflow" / "use step"), Temporal, AWS Step Functions (Amazon States Language), Cloudflare Workflows, and the event-sourcing pattern that makes them all replay-safe.
Must Know
2 specsWDK is the durable-execution model arriving inside the JavaScript ecosystem proper: instead of writing a state machine, you write `async` code with two extra string directives and you get crash-safe, resumable, retry-aware, observable workflows. If you're building AI agents, multi-step background jobs, human-in-the-loop flows, or anything that previously needed BullMQ + a state machine + careful idempotency, WDK collapses that into a single mental model. Even if you don't pick WDK specifically, the directives + replay model are the durable-execution pattern Temporal pioneered and Cloudflare/Restate/DBOS now ship โ knowing one teaches you the rest.
If you remember one thing about durable execution, remember this: workflow code runs many times and the event log decides what's real. That single fact explains every WDK rule โ why steps must be idempotent, why you can't read `Date.now()` directly in a workflow, why parameters are passed by value, why mutating an object inside a step doesn't propagate, and why bundlers occasionally bite you. Internalize the replay model and the rest of WDK (and Temporal, and Cloudflare Workflows) becomes obvious.
Should Know
5 specsThese are the primitives that make durable workflows worth their cost. Anything you'd previously implement with a cron job, a state column, a polling loop, or a queue + scheduled job collapses into ordinary linear code. Human-in-the-loop, long-running approval flows, multi-day onboarding sequences, and "call a tool then wait for the user" agent patterns all fall out of `sleep` + `createWebhook` for free.
AI chat UIs and agent dashboards need two things at once: streaming responses for perceived speed, and durable state so a reload doesn't lose the conversation. Workflow DevKit's resumable streams give you both โ the same writer that powers `streamText`-style UX is also the workflow's persisted output. If you've ever lost an in-progress LLM response to a refresh, this is the fix.
Durable execution removes the "did the request actually go through?" anxiety, but only if your steps are idempotent. The single most common WDK / Temporal / Step Functions bug is a non-idempotent step that sends two emails on retry. Treat steps like HTTP PUTs: design them to be safely repeatable, key your side-effecting external calls with a deterministic id, and the rest of the model takes care of itself.
Temporal is the prior art every durable-execution framework is measured against. If you're picking a workflow runtime, evaluating WDK, or reading a paper on durable execution, the Temporal model โ workflows + activities + signals + timers + event history โ is the vocabulary. Even if you choose WDK or Step Functions for a specific stack, understanding Temporal makes their tradeoffs legible (sandbox restrictions vs full Node, hosted runtime vs self-host, JSON DSL vs code).
Step Functions is the most widely deployed durable-execution runtime, and ASL is the JSON contract under it. If you ever need a state machine to coordinate Lambdas, run a Saga, or fan out parallel work in AWS, you'll write ASL โ and unlike WDK or Temporal it's a declarative DSL, not code, which is both its strength (visualizable, statically inspectable) and its weakness (loops and dynamic control flow are awkward). Worth knowing as the canonical "workflow as JSON" comparison point.
Niche / Specialized
3 specsMost "my workflow exploded after a deploy" reports come down to something non-serializable hiding in a step argument or return value. Treat the boundary like an API: pass plain data, return plain data, do all I/O and mutation inside steps, and never assume an object reference survives. Same discipline applies to Temporal activities and Step Functions tasks โ it's a property of event-sourced replay, not a WDK quirk.
If your stack is on Cloudflare Workers, Cloudflare Workflows is the native durable-execution option โ no extra infra, billed alongside Workers, and able to call other Workers/D1/R2/Queues directly. Useful comparison point for WDK: same model, different runtime constraints. Worth knowing both to pick the right tool and to recognize the durable-execution pattern as it spreads across edge platforms.
The deeper pattern under every durable-execution runtime. If you understand event sourcing, you understand why a WDK workflow function runs over and over and gets the same answer, why steps must be deterministic on the workflow side, and why activities/steps can be retried freely. Also useful in its own right โ for ledgers, audit logs, CQRS read models, and any system where "how did we get to this state?" is a real question.