# SkillOpt-inspired self-improvement plan for the Personal Orchestrator

Date: 2026-05-27
Source prompt: Connor linked Muratcan Koylan's X post about SkillOpt / “gradient descent for SKILL.md files” and asked how to incorporate the idea into faculties, context assets, extractors, and the Personal Orchestrator layer E2E.
External reference: arXiv:2605.23904v2, “SkillOpt: Executive Strategy for Self-Evolving Agent Skills” by Yifan Yang et al.; tweet: https://x.com/koylanai/status/2059113412278227328
Internal evidence: quick PO consult `/root/personal-orchestrator/consults/2026-05-27T20-34-39.655068-00-00/CONSULT.md`; latest full-breadth run `/root/personal-orchestrator/runs/2026-05-27T19-05-51.169241-00-00`.

## Executive summary

SkillOpt's core move is not “let an agent rewrite itself.” It is: treat a text artifact as trainable external state, then optimize it with the discipline of ML training: scored rollouts, bounded edits, held-out validation, rejected-edit memory, and slow/meta updates.

For the Personal Orchestrator, the equivalent trainable artifacts are broader than `SKILL.md`:

1. **Faculty prompts / faculty skills** — how each cognitive role judges.
2. **Context-asset extractors** — how source-near raw data becomes typed cognitive assets.
3. **Context-asset retrieval/ranking policies** — what evidence each faculty sees.
4. **Synthesis / Agent Manager policies** — what becomes a nudge, delegation, suppression, or quiet local artifact.
5. **Action/delegation goal templates** — how approved work is decomposed and proven.
6. **Evaluation rubrics** — how we score usefulness, focus, non-repeat, proof, autonomy, and attention cost.

The recommended upgrade is a **Personal Orchestrator SkillOpt loop**: a controlled optimization harness that proposes small patches to these artifacts, accepts them only if they improve held-out evals and/or real outcome metrics, stores rejected patches as negative feedback, and periodically distills stable lessons into a meta-guidance layer.

This should make PO feel less like a cron that occasionally emits good cards and more like a system that is actually learning from Connor's reactions, its own failures, source-health evidence, and delegated-work outcomes.

## What we should learn from SkillOpt

### 1. Optimize persistent text state, not ephemeral reasoning

SkillOpt treats the skill document as the trainable object while freezing the target model/harness. For us, the trainable state should include:

- `agents/faculties/*/prompt.md`
- Hermes skills used by PO, especially Agent Manager, Institutional Memory, Self-Healing, context assets, and source adapters
- context-asset manifests and extractor prompts/rules
- synthesis ranking/suppression/card-template policies
- delegation goal packet templates
- validation/eval rubrics

The important shift: faculty outputs are not just run artifacts. They are training trajectories for improving the above artifacts.

### 2. Use scored trajectories, not vibes

SkillOpt rollouts produce trajectories plus scalar scores. PO already has many trajectory sources:

- faculty run artifacts: `faculties/*.md` and `faculties/*.run.json`
- final surfaces: `TELEGRAM_NUDGE.md`, `COLLABORATOR_OUTPUT.md`, `synthesis.json`
- user feedback: `state/collaborator_feedback.jsonl`, Nudge Inbox feedback
- delegation lifecycle: `state/delegation_backlog.jsonl`, `delegations/*/status.json`, `run.log`, proof artifacts
- source-health and context-asset evidence: `state/context_assets/*`, heartbeat DB, cron outputs
- workbench: `state/PO_WORKBENCH.md`, `state/po_workbench.json`

We should formalize these as rollouts with score dimensions, not only archive them.

### 3. Bounded edits prevent self-improvement from becoming self-corruption

SkillOpt does add/delete/replace edits under an edit budget. PO should never do unconstrained rewrites of a faculty, extractor, or synthesis policy just because one run was bad.

Default patch budget:

- faculty prompt: max 1–3 localized patches per accepted optimization step
- extractor: max one source/type behavior change per step
- synthesis policy: max one scoring/suppression/card-template change per step
- Hermes skill: patch an existing section before creating a new skill
- context asset schema: no schema changes without explicit migration/eval plan

### 4. Held-out gates matter more than generation quality

The optimizer model can write plausible but harmful patches. The gate decides what lands.

For PO, acceptance should require passing a held-out eval pack before patches are applied:

- historical runs not used to generate the patch
- recent Connor feedback rows not used in training prompt
- known noisy/wrong clusters
- synthetic canaries for safety boundaries
- source-health failure cases
- delegation proof cases

A patch can be eloquent and still fail if it causes more repeats, unsafe autonomy, weaker provenance, or lower actionability.

### 5. Rejected patches are a first-class learning signal

SkillOpt stores rejected edits and score drops. PO should keep a rejected-patch buffer so future optimizers know what not to repeat.

Examples:

- “Over-deterministic suppression registry” was noisy: preserve judgement, do not hard-code brittle filters.
- “Signal Radar source repair as PO priority” was noisy: PO may use Signal Radar opportunistically but should not make its repair a top PO task unless it harms PO recommendations.
- “Useful but too vague” means card templates need more implementation detail and problem framing, not another abstract question.

### 6. Slow/meta updates are perfect for faculty judgement memory

SkillOpt separates fast local edits from slow optimizer-side meta guidance. PO should adopt the same split:

- **Fast patches:** localized prompt/extractor/template patches after eval pass.
- **Slow lessons:** faculty experience heuristics distilled from multiple runs, stored in a Faculty Experience Ledger and retrieved before future judgement.

This maps exactly onto the existing intended split:

- GBrain = semantic truth
- Hermes skills = procedural competence
- Faculty Experience Ledger = judgement history

## Proposed architecture: PO-Opt

```text
source-near stores / crons / feedback / delegations
  -> rollout builder
  -> train / selection / test split packs
  -> optimizer agent proposes bounded patches
  -> patch applier creates candidate artifact versions
  -> evaluation harness scores candidate vs baseline
  -> acceptance gate promotes only improved versions
  -> rejected patch buffer records failures
  -> slow/meta update distills stable lessons
  -> next real faculty run uses improved artifacts
```

## Trainable artifact registry

Create a registry of optimizable artifacts:

```yaml
artifacts:
  faculty_prompt:
    path_glob: agents/faculties/*/prompt.md
    allowed_edits: [append_section, replace_section, delete_section]
    max_edits_per_step: 2
    eval_pack: faculty_judgement_eval
    risk: local_safe_work
  context_asset_extractor:
    path_glob: runners/context_assets.py
    allowed_edits: [add_extractor, adjust_classifier, add_source_health_case]
    max_edits_per_step: 1
    eval_pack: context_asset_eval
    risk: local_safe_work
  synthesis_policy:
    path_glob: runners/observability_run.py runners/action_manager.py
    allowed_edits: [scoring_adjustment, card_template_patch, suppression_rule_patch]
    max_edits_per_step: 1
    eval_pack: attention_surface_eval
    risk: local_safe_work
  hermes_skill:
    path_glob: ~/.hermes/skills/personal-orchestrator/**/SKILL.md
    allowed_edits: [append_pitfall, replace_step, add_validation]
    max_edits_per_step: 2
    eval_pack: skill_regression_eval
    risk: local_safe_work
```

## Rollout/eval design

### Score dimensions

Each candidate patch should be scored on:

1. **Usefulness:** would Connor likely rate the output useful or approve action?
2. **Actionability:** does it state problem, implementation shape, risk, evidence, and done condition?
3. **Discernment:** does it suppress/merge weak, duplicate, or low-leverage signals?
4. **Non-repeat:** does it avoid re-surfacing acknowledged/noisy clusters unless materially changed?
5. **Evidence quality:** does it cite source-near artifacts with freshness/confidence/caveats?
6. **Safety:** does it preserve external-side-effect approval boundaries?
7. **Autonomy fit:** does it route safe local work to local/delegated background paths rather than asking Connor too much?
8. **Proofability:** does every action have an observable done condition and proof path?
9. **Breadth without volume:** does it consider all faculties/context assets without dumping every faculty's opinion?
10. **Regression risk:** does it preserve existing passing tests and accepted behavior?

### Eval packs

Create versioned eval packs under:

```text
/root/personal-orchestrator/evals/
  faculty_judgement/
  context_assets/
  extractors/
  synthesis_attention/
  action_manager/
  nudge_ux/
  source_health/
```

Each pack should contain:

- training examples: can be used to generate patches
- selection examples: gate candidate patches
- test examples: periodic reporting only
- canaries: safety and anti-regression cases
- scoring rubric: JSON dimensions + human-readable explanation

### Acceptance gate

A patch is accepted only if:

- selection score improves over baseline by a configured margin, or fixes a hard failure without degrading weighted score;
- all tests pass;
- safety canaries pass;
- no source-near provenance regression;
- no new external-side-effect capability is introduced;
- patch stays within edit budget and artifact boundary;
- changed artifact has a rollback path.

Rejected patches are written to:

```text
/root/personal-orchestrator/po_opt/rejected_patches.jsonl
```

Accepted patches are written to:

```text
/root/personal-orchestrator/po_opt/accepted_patches.jsonl
/root/personal-orchestrator/po_opt/runs/<run_id>/
```

## Layer-by-layer plan

### Phase 0 — Freeze the safety doctrine and baseline

Goal: establish a baseline before self-improvement patches start landing.

Implement:

- `po_opt/ARTIFACT_REGISTRY.yaml`
- `po_opt/SAFETY_DOCTRINE.md`
- baseline scorecard over latest 10–30 runs
- snapshot of optimizable artifact hashes
- mandatory rollback metadata for every candidate patch

Acceptance:

- baseline can be regenerated deterministically;
- artifact registry lists every optimizable file family;
- no patch runner can touch files outside registry.

### Phase 1 — Faculty Experience Ledger as rollout memory

Goal: turn faculty runs into structured training trajectories.

Implement:

```text
/root/personal-orchestrator/faculty_experience/
  runs.jsonl
  lessons.yaml
  patch_queue.jsonl
  rejected_lessons.jsonl
  scorecards/*.md
```

Each faculty judgement event should record:

- run id, faculty id, prompt hash
- retrieved context assets and evidence refs
- judgement: surface/suppress/question/action/blocked
- proposed card/action/suppression
- final synthesis decision
- Connor feedback/outcome if available
- delegation/proof outcome if available
- inferred lesson candidate

Acceptance:

- each real faculty run appends experience rows;
- future faculty prompts can retrieve 3–5 similar prior lessons;
- scorecards show useful/noisy/wrong/repeated-cluster rates by faculty.

### Phase 2 — Context Asset Optimizer

Goal: optimize typed asset extraction and retrieval, not just faculty prompts.

Implement evals for:

- source failure -> emits `source_health` / `uncertainty`, not silent empty context
- Open Tabs -> emits open loop + temporal state + attention priority + affordance
- collaborator feedback -> emits feedback + outcome + proof + caveat
- cron outputs -> emits health + repair candidate + proof refs
- delegation logs -> emits lifecycle/proof/blocker assets

Patch targets:

- extractor registry seams
- asset type classifiers
- retrieval packet ranking/caveats
- provenance/freshness/confidence rendering

Acceptance:

- a single source-near row can emit multiple typed cognitive assets where appropriate;
- missing sources produce explicit health assets;
- retrieval packets cite source refs and caveats;
- tests prove idempotent indexing.

### Phase 3 — Faculty Prompt Optimizer

Goal: make each faculty improve judgement without becoming deterministic.

For each faculty, create a small benchmark of historical situations:

- input evidence packet
- expected useful judgement properties
- expected suppressions
- unacceptable outputs
- scoring rubric

Patch constraints:

- only localized prompt sections;
- preserve input boundary, output contract, evidence requirement, forbidden actions;
- prefer adding “when to suppress” and “when to route local work” heuristics over generic encouragement.

Acceptance examples:

- Focus/Open Tabs learns: concrete debt/deadline/admin affordance -> propose admin block, not abstract runway question.
- Agent Manager learns: cards need problem statement + implementation outline + risk + proof, not just “Hermes can do X.”
- Self-Healing learns: source repair is high-priority only when it degrades PO recommendations, not merely because a source is imperfect.
- Institutional Memory learns: repeated `do_this` becomes a delegation pattern; repeated `noisy` becomes suppression/merge/source-repair candidate.

### Phase 4 — Synthesis / Attention Surface Optimizer

Goal: optimize the final bottleneck: what Connor sees.

Build eval pack around recent known cases:

- useful-but-too-vague card
- noisy Signal Radar repair card
- repeated backlog card flood
- “no nudge crossed threshold” with hidden faculty insights
- multi-card delayed catch-up / Nudge Inbox UX

Patch targets:

- card template fields
- scoring weights
- cluster merge policy
- cooldown/reopen policy
- Nudge Inbox rendering/resolution
- “local safe work vs ask Connor” routing

Acceptance:

- one semantic cluster produces at most one user-facing card;
- card includes why, implementation shape, risk, evidence, proof;
- known noisy clusters are suppressed unless new evidence changes urgency/risk/deadline/done condition;
- safe local improvements are queued/done quietly with proof rather than repeatedly asking Connor.

### Phase 5 — Extractor/source-health self-healing loop

Goal: let source-near infrastructure improve from failures, without unsafe mutations.

Implement:

- extractor failure ledger
- source-health eval pack
- repair proposal generator
- read-only proof checks
- approval gates for credentials/scopes/external changes

Acceptance:

- failures become source_health/uncertainty assets;
- recurring failures become local repair candidates;
- repairs must add regression checks;
- external services are not mutated without explicit approval.

### Phase 6 — E2E optimizer run

Goal: evaluate PO as a whole system, not isolated patches.

Run an offline E2E comparison:

- baseline artifact set
- candidate patched artifact set
- same historical eval suite
- same target model/harness where possible
- compare final nudge/action/delegation outputs

Quality claim only allowed if:

- `manifest.json` proves real faculty-agent records for a live smoke run;
- offline eval shows candidate > baseline;
- no safety canary fails;
- full test suite passes;
- final human-facing surface is inspected for attention quality.

## Concrete implementation slices

### Slice A — PO-Opt skeleton

Files:

```text
po_opt/ARTIFACT_REGISTRY.yaml
po_opt/SAFETY_DOCTRINE.md
runners/po_opt.py
tests/test_po_opt.py
```

Capabilities:

- list optimizable artifacts and hashes
- create candidate patch run directories
- enforce edit boundaries
- write accepted/rejected patch ledgers

### Slice B — Faculty Experience Ledger

Files:

```text
runners/faculty_experience.py
tests/test_faculty_experience.py
faculty_experience/runs.jsonl
faculty_experience/lessons.yaml
faculty_experience/patch_queue.jsonl
```

Capabilities:

- append run experience from `runs/latest`
- derive per-faculty scorecards from feedback/delegations
- retrieve similar prior decisions for faculty prompts/context assets

### Slice C — Eval packs and scorer

Files:

```text
runners/po_eval.py
evals/synthesis_attention/*.jsonl
evals/context_assets/*.jsonl
evals/faculty_judgement/*.jsonl
tests/test_po_eval.py
```

Capabilities:

- score baseline/candidate outputs on rubric dimensions
- include hard safety canaries
- produce `EVAL_REPORT.md` and `eval_report.json`

### Slice D — Bounded patch optimizer

Files:

```text
runners/po_optimizer.py
tests/test_po_optimizer.py
po_opt/prompts/optimizer.md
po_opt/prompts/patch_ranker.md
```

Capabilities:

- collect failure/success minibatches from eval runs
- ask optimizer model for add/delete/replace patches
- rank and clip patches to edit budget
- apply patch to candidate copy only
- evaluate before promotion

### Slice E — Slow/meta update layer

Files:

```text
po_opt/meta_guidance.yaml
po_opt/rejected_patches.jsonl
po_opt/accepted_patches.jsonl
runners/po_meta_update.py
```

Capabilities:

- summarize stable lessons across accepted/rejected patch history
- distinguish deployed artifact changes from optimizer-only guidance
- feed meta guidance into future patch generation but not into runtime unless accepted

## First eval cases to seed

Use these from current PO history:

1. **Useful but vague implementation card**
   - Expected improvement: card template includes problem, implementation outline, risk, evidence, done condition.
   - Evidence: consult lines 26–41 and synthesis card `6075d620b3`.

2. **Signal Radar repair as noisy PO priority**
   - Expected improvement: source repair is opportunistic unless it harms PO recommendation quality.
   - Evidence: consult lines 44–54.

3. **Repeated backlog item flood**
   - Expected improvement: Workbench suppressions are compacted/deduped and no repeated identical nudge is emitted.
   - Evidence: consult lines 83–117.

4. **Source-near adapter onboarding**
   - Expected improvement: reusable local skill/delegation pattern, not another abstract card.
   - Evidence: latest synthesis card `f4a42beb07`.

5. **Nudge Inbox delayed catch-up**
   - Expected improvement: ordinal references and batch delegation work after delay; stale views refresh rather than guess.
   - Evidence: implemented tests `tests/test_nudge_inbox.py`.

## Safety and anti-patterns

Do not:

- let optimizer patches land directly in production files without held-out eval;
- let a single feedback row create a broad deterministic rule;
- turn PO into if/then suppression machinery that removes faculty judgement;
- optimize only prompts while leaving extractors/context assets dirty;
- count artifact presence as intelligence quality;
- publish or expose private raw data in public artifacts;
- mutate external systems during self-improvement.

Do:

- use candidate copies, patch diffs, rollback, and acceptance gates;
- score against attention quality, usefulness, safety, and proof, not just test pass;
- preserve rejected patches as negative training data;
- distinguish fast local patching from slow meta lessons;
- require real-agent E2E proof for any quality claim;
- let safe local self-improvement happen quietly with proof.

## Success metric

The system is improving when, over a rolling 2–4 week window:

- useful/do_this rate rises;
- noisy/wrong/repeated-cluster rate falls;
- fewer cards ask Connor to arbitrate internal maintenance;
- more safe local repairs/delegations complete with proof;
- context-asset packets become more cited, fresh, and caveated;
- faculties retrieve relevant prior lessons before judging;
- final nudges become fewer, more concrete, and more action-changing;
- source failures become explicit uncertainty/health assets rather than invisible gaps.

## Recommended immediate next move

Implement Slice A + B first: **PO-Opt skeleton + Faculty Experience Ledger**.

Reason: SkillOpt-style optimization is only safe once we have structured rollouts and artifact boundaries. The Faculty Experience Ledger supplies the replay buffer; the PO-Opt skeleton supplies the safety rails. After that, context-asset and faculty prompt optimization become straightforward, gated, and auditable.