Dynamic Workflows in Claude Code: When the AI Writes Its Own Orchestration

Most of us still build our agent workflows by hand: send a prompt, copy the output, paste it into the next prompt, correct, repeat. With Dynamic Workflows, Anthropic inverts this pattern — Claude writes the orchestration itself. An assessment of what changes technically and when the effort is worthwhile.

The Real Turning Point Is Hidden in an Example

Jarred Sumner ported the JavaScript runtime Bun from Zig to Rust. Roughly 750,000 lines of Rust, 99.8 per cent of the existing test suite passing, eleven days from the first commit to the merge. Not by hand, and not in a long pair-programming session with Claude, but using Dynamic Workflows — the orchestration feature that Anthropic released on 28 May 2026 alongside Claude Opus 4.8 as a Research Preview.

A migration of that scale in eleven days is not a PR gimmick. It demonstrates an architectural shift that is relevant to anyone working seriously with AI agents: the plan moves out of the conversation and into the code.

What a Dynamic Workflow Actually Is

The standard Claude Code harness lets Claude plan and execute within the same context window. For the bulk of coding work, this is excellent. There is, however, a class of tasks at which a single context window breaks down: long-running, massively parallel, highly structured — or work whose output must be actively verified against itself.

Until now, Anthropic built bespoke harnesses for such cases — for Research or Code Review, for instance. A Dynamic Workflow automates precisely that step: Claude writes the harness on the fly, as a JavaScript script, tailored to the specific task at hand. This script decides which subagents start in which order, with what looping or branching logic — and holds intermediate state in variables that live outside the conversation.

Three capabilities follow from this that the standard harness cannot provide:

Isolation per agent. Each subagent receives its own context window with a single, focused objective. No cross-contamination between subtasks.
Model choice per agent. The workflow decides which model each subagent uses: Opus for demanding reasoning steps, Haiku for inexpensive exploration, Sonnet for the middle ground.
Persistence. Progress is saved continuously. If a run is interrupted, it resumes where it left off rather than starting again from scratch.

A workflow can be started in three ways. The simplest is plain language — “create a workflow that…” or “use a workflow for this”: Claude treats a direct request as an opt-in. Anyone who wants an explicit trigger writes the keyword ultracode in the prompt; this runs a single task as a workflow without changing the effort level of the session. And with /effort ultracode you switch on session-wide auto-orchestration: Claude raises the reasoning effort to the maximum and decides for itself when a task is large enough to warrant a workflow.

One small but important change since launch: originally, the word “workflow” alone in a prompt was sufficient as a trigger. Following user feedback, Anthropic changed the explicit trigger to ultracode — so Claude no longer starts a Dynamic Workflow when “workflow” is mentioned only in passing. Phrasings such as “use a workflow for this” continue to work.

The Three Failure Modes That Workflows Structurally Resolve

To understand when a workflow is worthwhile, you need to understand what it fixes. The longer an agent works on a complex task within a single context window, the more susceptible it becomes to three patterns:

Premature stopping. The agent declares a multi-part task complete after partial progress — processing 20 of 50 items in a security review and marking the rest as “addressed”.
Self-favouritism. When asked to evaluate its own output against a rubric, Claude tends to favour its own work. A reviewer with a personal stake is not an impartial reviewer.
Goal drift. Across many turns — especially after each context compression — fidelity to the original objective erodes gradually. “Do not do X” constraints eventually disappear unnoticed.

A workflow addresses all three at a structural level: separate Claude instances, each with its own context, a focused objective, and isolated state. If your own task suffers from any of these patterns, that is the signal to reach for a workflow.

The Core Primitives: agent, parallel, pipeline

You do not need to be a compiler engineer to read workflows. Three functions do the bulk of the work:

agent() launches a single subagent with a goal, a model, and optionally an output schema.
parallel() is a barrier: it fans the work out and waits until everything is complete before returning.
pipeline() is streaming: each element flows independently through all stages.

The decisive question when choosing between them: do I need all results before I can continue? Yes → parallel(). No → pipeline() (cheaper and faster overall). Confusing the two is one of the most common mistakes — the barrier is not a detail, it is the entire difference.

Around these primitives, a handful of patterns crystallise in practice; these rarely appear in isolation — a real workflow typically combines two to four of them:

Fan-out and synthesis. Process a clearly enumerable list of work items (50 files, 200 endpoints) in parallel, then have one agent consolidate everything into a single prioritised response. The standard remedy for goal drift.
Independent verification. For every producing agent, a separate agent that checks the result against a rubric — without knowing who produced it. The structural fix for self-favouritism. Author and reviewer must never be the same instance.
Generate and filter. Produce many ideas, then filter ruthlessly by quality. The opposite of asking Claude for “the best answer” — the agent commits late, after every option has been challenged.
Tournament. Rather than ranking 1,000 items in a single prompt (quality degrades, context breaks), fresh agents compare just two items at a time. Comparative judgement is more reliable than absolute scoring — especially for taste-dependent work.
Loop until done. For unknown scope, spawn agents until a stopping condition is met (no new findings, no remaining errors in the log), rather than running a fixed number of iterations.

The pragmatic way to remember this: first identify the failure mode of your own task, then choose the pattern that prevents it. Drift → fan-out. Self-favouritism → independent verification. Open-ended scope → loop until done. Hard to evaluate → tournament.

When a Workflow Is Worthwhile — and When It Is Not

Here is the most important sentence, and it comes from Anthropic itself: Dynamic Workflows consume significantly more tokens than a typical Claude Code session. This is not a tool for every task.

The most common mistake is starting a workflow where a normal session would have sufficed. The honest rule of thumb: if a regular Claude Code session would complete the task in five minutes, you do not need a workflow or a committee of five reviewers.

Three control mechanisms turn “impressive but expensive” into a tool you can run unattended:

/goal sets a hard completion condition. Combined with the loop pattern: “do not stop until a theory holds”. Without /goal, a workflow stops at the first soft stopping point.
/loop runs the entire workflow on a recurring schedule — useful for continuous triage or weekly research updates.
Explicit token budget. A simple “use 10k tokens” in the prompt caps the run. Without a cap, an ambitious workflow can grow to five to ten times the expected cost.

On the first trigger, Claude Code displays what will run and asks for confirmation. Org admins can also disable workflows entirely via Managed Settings.

A Security Pattern That Is Not Optional

The moment a workflow reads content not written by you — support tickets, bug reports, user feedback, scraped web pages, output from third-party APIs — you must assume that content may contain prompt injection.

The solution is quarantine: the agents that read the untrusted content must not be permitted to carry out privileged actions. Separate agents that never see the raw content handle any acting. A 30-line reader agent in read-only mode costs almost nothing and eliminates an entire class of attack vectors. Anyone processing content that does not originate from themselves or a trusted team should not treat this as optional.

If a Workflow Works, Save It

A proven workflow can be saved and thereby made reusable — locally across multiple projects, or packaged as a skill that others can install and run identically.

A practical nuance here: when packaging a workflow as a skill, Claude should be instructed to treat it as a template rather than a script to be executed verbatim. This leaves room to adapt the form to the specific task without losing the overall structure — particularly valuable for flexible patterns such as “Deep Verification” or “Triage”.

Assessment

Dynamic Workflows are not a new model and not a plugin, but a subtle yet profound architectural shift: the orchestration plan leaves the context window and becomes executable code. In doing so, Anthropic solves a problem familiar to anyone who has ever tried to keep an agent on track over several hours — the fact that reliability across long horizons does not emerge from more prompt craft, but from structure.

For teams in the DACH region taking AI-assisted development seriously, the relevant question is not “should I try this?” but rather “which of my tasks are genuinely long-running, parallel, or critical enough for the token cost to be justified?” Anyone who answers that question honestly has already got through half the learning curve.

Dynamic Workflows have been available since 28 May 2026 as a Research Preview in the Claude Code CLI, the desktop client, and the VS Code extension (Max, Team, and Enterprise plans, as well as via the Claude API, Amazon Bedrock, Vertex AI, and Microsoft Foundry). Source: Introducing dynamic workflows in Claude Code · Documentation.