Open Source · Claude Code Skill · Domain-Agnostic
Autonomous Iteration for Any Task
Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat. Let Claude iterate autonomously with mechanical verification and automatic rollback.
- ✓ One atomic change per iteration — if it breaks, you know exactly why
- ✓ Automatic git rollback on failures — no debates, no manual cleanup
- ✓ Works on any domain — code, ML, content, performance, refactoring
$ /autoresearch Goal: Increase test coverage to 95% Metric: npm test -- --coverage | grep "All files" Scope: src/**/*.ts Direction: higher_is_better # Claude loops autonomously — modify, verify, keep/discard, repeat
→ How It Works
Set the goal. Start the loop. Walk away.
Define Goal & Metric
Tell Claude what "better" means. Pick a mechanical metric — test coverage, build time, lighthouse score, val_bpb — anything measurable.
Autonomous Loop
Claude makes one atomic change, commits, verifies the metric, and keeps or reverts. No human input needed between iterations.
Review Results
Every iteration is logged in a TSV file. Kept changes stay as git commits. You get a clean history of what worked and what didn't.
→ Features
Karpathy's autoresearch principles, generalized for any work — now with a planning wizard.
Plan Wizard (v1.0.2)
Describe your goal in plain language. The wizard suggests metrics, validates your verify command with a dry-run, and outputs a ready-to-launch config.
Constraint-Driven Loop
One change per iteration. Commit before verify. Auto-revert on failure. No ambiguity in what caused what.
Mechanical Verification
No subjective "looks good." Every iteration runs a real metric — tests, benchmarks, scores, build output.
Automatic Rollback
Failed changes revert instantly via git reset. No manual cleanup, no debugging compound failures.
Git as Memory
Every kept change is a commit. The agent reads its own git history to learn what works and avoid past mistakes.
Results Logging
TSV log tracks every iteration — metric, delta, status, description. Pattern recognition across experiments.
Domain-Agnostic
Works on backend code, ML training, frontend UI, content, performance — any task with a measurable outcome.
→ Use Cases
Same loop, different domains. The principles are universal — the metrics are domain-specific.
Backend Code
- Metric:
- Tests pass + coverage %
- Scope:
- src/**/*.ts
- Verify:
- npm test
Frontend UI
- Metric:
- Lighthouse score
- Scope:
- src/components/**
- Verify:
- npx lighthouse
ML Training
- Metric:
- val_bpb / loss
- Scope:
- train.py
- Verify:
- uv run train.py
Performance
- Metric:
- Benchmark time (ms)
- Scope:
- Target files
- Verify:
- npm run bench
Refactoring
- Metric:
- Tests pass + LOC reduced
- Scope:
- Target module
- Verify:
- npm test && wc -l
Blog / Content
- Metric:
- Word count + readability
- Scope:
- content/*.md
- Verify:
- Custom script
→ Quick Start
Two commands to install. One command to run.
1. Install the Skill
git clone https://github.com/uditgoenka/autoresearch.git /tmp/autoresearch cp -r /tmp/autoresearch/skills/autoresearch ~/.claude/skills/autoresearch
2. Plan Your Run (New in v1.0.2)
# Interactive wizard — builds Scope, Metric & Verify from your Goal: /autoresearch:plan Goal: Make the API respond faster # The wizard scans your codebase, suggests metrics, # dry-runs the verify command, and outputs a ready-to-paste config.
3. Run Unlimited
# Inside any project directory: /autoresearch Goal: Increase test coverage to 95% Metric: npm test -- --coverage | grep "All files" Scope: src/**/*.ts Direction: higher_is_better
4. Run Bounded (Optional)
# Run exactly 25 iterations then stop: /loop 25 /autoresearch Goal: Reduce bundle size below 200KB Metric: npm run build | grep "Total size" Direction: lower_is_better
→ FAQ
Common questions about Autoresearch.