Open Source · Claude Code Skill · Domain-Agnostic

Autonomous Iteration for Any Task

Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat. Let Claude iterate autonomously with mechanical verification and automatic rollback.

$ /autoresearch
Goal: Increase test coverage to 95%
Metric: npm test -- --coverage | grep "All files"
Scope: src/**/*.ts
Direction: higher_is_better

# Claude loops autonomously — modify, verify, keep/discard, repeat

→ How It Works

Set the goal. Start the loop. Walk away.

01

Define Goal & Metric

Tell Claude what "better" means. Pick a mechanical metric — test coverage, build time, lighthouse score, val_bpb — anything measurable.

02

Autonomous Loop

Claude makes one atomic change, commits, verifies the metric, and keeps or reverts. No human input needed between iterations.

03

Review Results

Every iteration is logged in a TSV file. Kept changes stay as git commits. You get a clean history of what worked and what didn't.

→ Features

Karpathy's autoresearch principles, generalized for any work — now with a planning wizard.

Plan Wizard (v1.0.2)

Describe your goal in plain language. The wizard suggests metrics, validates your verify command with a dry-run, and outputs a ready-to-launch config.

Constraint-Driven Loop

One change per iteration. Commit before verify. Auto-revert on failure. No ambiguity in what caused what.

Mechanical Verification

No subjective "looks good." Every iteration runs a real metric — tests, benchmarks, scores, build output.

Automatic Rollback

Failed changes revert instantly via git reset. No manual cleanup, no debugging compound failures.

Git as Memory

Every kept change is a commit. The agent reads its own git history to learn what works and avoid past mistakes.

Results Logging

TSV log tracks every iteration — metric, delta, status, description. Pattern recognition across experiments.

Domain-Agnostic

Works on backend code, ML training, frontend UI, content, performance — any task with a measurable outcome.

→ Use Cases

Same loop, different domains. The principles are universal — the metrics are domain-specific.

Backend Code

Metric:
Tests pass + coverage %
Scope:
src/**/*.ts
Verify:
npm test

Frontend UI

Metric:
Lighthouse score
Scope:
src/components/**
Verify:
npx lighthouse

ML Training

Metric:
val_bpb / loss
Scope:
train.py
Verify:
uv run train.py

Performance

Metric:
Benchmark time (ms)
Scope:
Target files
Verify:
npm run bench

Refactoring

Metric:
Tests pass + LOC reduced
Scope:
Target module
Verify:
npm test && wc -l

Blog / Content

Metric:
Word count + readability
Scope:
content/*.md
Verify:
Custom script

→ Quick Start

Two commands to install. One command to run.

1. Install the Skill

git clone https://github.com/uditgoenka/autoresearch.git /tmp/autoresearch
cp -r /tmp/autoresearch/skills/autoresearch ~/.claude/skills/autoresearch

2. Plan Your Run (New in v1.0.2)

# Interactive wizard — builds Scope, Metric & Verify from your Goal:
/autoresearch:plan
Goal: Make the API respond faster

# The wizard scans your codebase, suggests metrics,
# dry-runs the verify command, and outputs a ready-to-paste config.

3. Run Unlimited

# Inside any project directory:
/autoresearch
Goal: Increase test coverage to 95%
Metric: npm test -- --coverage | grep "All files"
Scope: src/**/*.ts
Direction: higher_is_better

4. Run Bounded (Optional)

# Run exactly 25 iterations then stop:
/loop 25 /autoresearch
Goal: Reduce bundle size below 200KB
Metric: npm run build | grep "Total size"
Direction: lower_is_better

→ FAQ

Common questions about Autoresearch.

Start Iterating Autonomously

Free. Open source. Works on any task.

Let's Build Something Together

Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.

Get in Touch