DELM: decentralized agents need shared verified state, not a bigger boss.
Yuzhen Mao and Azalia Mirhoseini propose Decentralized Language Models: a multi-agent framework where parallel agents coordinate through a task queue and a compact, verified, unfoldable shared context instead of a central orchestrator.
Key overview
1The thesis
Existing multi-agent systems parallelize workers but centralize coordination. DELM decentralizes coordination by making verified progress persistent and readable by all agents.
2The one-line model
DELM = parallel agents + task queue + verified shared context + compact gists + selective unfolding.
The shared context is intentionally not a raw chat transcript. It is a curated working set with backing evidence.
SWE-bench Verified with Gemini 3 Flash, beating baselines while reducing cost.
Largest reported gain over the strongest baseline on LongBench-v2 Multi-Doc QA.
Gemini SWE-bench cost fell to $0.12/task versus roughly $0.24–$0.26 for strong baselines.
Concept points
Centralized MAS bottleneck
A main agent decomposes, delegates, waits, merges, and launches another round. This creates a scatter-gather bottleneck and makes every worker depend on the controller’s integration quality.
Shared context as blackboard, but stricter
DELM resembles a blackboard architecture, but with stronger rules: entries are compact, verified, evidence-backed, and selectively unfoldable.
Gists as resident working set
Agents read short gists by default. This keeps the global state cheap enough to include in many calls while preserving pointers to detailed evidence.
Selective unfolding as demand paging
When a gist is insufficient, an agent can unfold to a grounded summary, then to raw evidence. Detail is pulled only when the subtask requires it.
Admission-time verification
Outputs do not enter shared state automatically. They are checked against supporting evidence first. Unsupported claims are rejected, regenerated, or returned to the queue.
Failures become assets
Failed hypotheses, constraints, and patch summaries become reusable. This prevents other agents from rediscovering the same dead ends.
Approach
Generate initial subtasks from the task and optional source context.
Parallel agents asynchronously claim ready subtasks from the queue.
Each agent reads the current verified shared context and works locally.
Completed outputs are compressed, verified, and appended as compact gists.
If more work is needed, generate new subtasks; otherwise answer from verified state.
For reasoning trajectories
Compress the useful result directly into a gist: finding, failure, feedback, constraint, or patch summary. Verify that the gist faithfully captures the underlying trajectory before admitting it.
For long source units
Use a hierarchy: raw source → reference-grounded summary → compact gist. Store the gist in shared context; keep the summary/raw source in backing stores for unfolding.
Key findings
Real GitHub issue fixing
Financial, government, news, legal, academic
Innovation points
1. Coordination substrate
The shared context is the medium of collaboration. Agents do not need all communication routed through a main agent.
2. Verified before reusable
The admission gate prevents plausible but unsupported claims from becoming shared truth.
3. Compact + recoverable
Gists are small enough to be globally visible, while summaries/raw evidence remain recoverable.
4. Failure sharing
Negative results are promoted into constraints, reducing repeated dead-end exploration.
5. Dependency-aware queueing
Subtasks can be made eligible only when dependencies complete; blocked queues can generate missing prerequisite tasks.
6. Hybridization with tools
DELM does not replace programmatic agents. It adds decentralized verified state around them.
Practical takeaway points
For agent engineering
- Build a shared state layer, not just supervisor/subagent calls.
- Store distilled findings, failures, constraints, and evidence pointers.
- Make updates pass an admission check before other agents can rely on them.
- Keep shared state compact; keep raw evidence recoverable.
For research agents
- Use shared context to avoid rereading the same papers or rerunning failed analyses.
- Require evidence-backed claims before synthesis.
- Separate global navigation from local evidence inspection.
- Pair natural-language state with code/REPL for exact aggregation.
Limitation points
Verification overhead
Admission-time checking costs extra calls and latency. The paper argues the reliability gain is worth it, but lighter verifiers are future work.
Decomposition quality
DELM inherits the quality of the generated task topology. Too coarse: agents are under-specified. Too fine: unnecessary agents and coordination overhead.
Natural language is weak for exact aggregation
On OOLONG, vanilla DELM underperforms RLM because exact counting/filtering/tie-handling benefits from executable code.
Prompt/model sensitivity
The authors note there is no universally optimal prompt across model families. DELM may need prompt adaptation per model.
Hype points vs grounded read
What is genuinely exciting
- It points at a real scaling law for agent systems: not “more agents”, but better shared state.
- The verification-before-admission frame is exactly the missing safety rail in many agent swarms.
- The memory hierarchy feels OS-like: resident gist, backing summary, raw evidence, demand paging.
- The RLM hybrid result suggests this can wrap tool-using agents, not just chatty agents.
What not to overclaim
- It is not proof that decentralized agents always beat orchestrators.
- It is not a complete autonomous research system by itself.
- It still relies on LLM decomposition, summarization, and verification quality.
- Some domains need code, databases, or formal checks rather than prose state.