Self-Improving Agents

Agents that rewrite their own code, data, or skills

From self-play and test-time adaptation to executable subagents and autoresearch ratchets — how agents compound capability over time.

2023

Shinn et al.

Reflexion

Verbal reinforcement learning — agent writes language 'lessons' after failures, reads them before retry. HumanEval 80% → 91% without weight updates.

→ 2024

Chen et al.

SPIN

Self-play fine-tuning, model vs previous self, no extra annotations needed.

→ 2024

Li et al.

Cherry LLM

Self-guided data selection via IFD metric, 5% data outperforms full dataset.

→ 2024

(see paper)

RISE

Recursive introspection, multi-turn self-correction, +23.9% on GSM8K.

→ 2025

(see paper)

EvolveR

Experience-driven self-evolution, distill trajectories into abstract strategic principles.

→ 2025

Acikgoz et al.

Self-Improving at Test-Time

Detect weak spots → auto-generate data → LoRA at test time, +5.48% with 68× fewer samples.

→ 2025

Liu & van der Schaar

Metacognitive Learning

Framework: agents need self-assessment, learning planning, and evaluation to truly self-improve.

→ 2026

Zhang et al.

AgentFactory

Preserves successful solutions as executable Python subagents, not text. Install→Self-Evolve→Deploy lifecycle, ~57% orchestration cost reduction.

→ 2026

Karpathy

autoresearch

Autonomous overnight ML research — agent edits train.py, runs 5-min experiments, keeps/reverts on val_bpb. program.md as lightweight skill.

→ 2026

Huashu

Darwin Skill · 达尔文

Autoresearch ratchet applied to SKILL.md optimization — 8-dim rubric (structure + effectiveness), independent sub-agent scoring, git-revert on regression.

→