Harness updating is not harness benefit
Key takeaways from Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents (arXiv:2605.30621). The paper separates two things that are often conflated in self-improving agents: writing better harness artifacts, and actually using those artifacts during task execution.
Evolver spread
Across benchmarks, the best-vs-worst harness-updater gap is narrow.
Weak skill-load rate
Qwen3-32B often fails to bring relevant skill artifacts into context.
Adherence drift
Weak models lose harness-following over long trajectories.
The core distinction
Harness-updating
The ability of an evolver model to read execution evidence and write useful persistent artifacts: skills, prompts, memories, tool rules.
Harness-benefit
The ability of the task-solving model to retrieve/load those artifacts and follow them faithfully while solving future tasks.
Main takeaways
Selected quantitative results
Why it matters for agent systems
If you are building Hermes/GBrain-style self-improving agents, the paper argues for a very practical architecture: use a cheap/good-enough model to propose or summarize durable harness updates, then spend your expensive capability budget on the agent that must execute with those updates. But do not assume saved skills/memories help automatically: measure whether the agent loads them and follows them across long-horizon work.