Open Source · Claude Code Skill · Domain-Agnostic · v1.0.3

Autonomous Iteration for Any Task

Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat. Let Claude iterate autonomously with mechanical verification and automatic rollback.

$ /autoresearch
Goal: Increase test coverage to 95%
Metric: npm test -- --coverage | grep "All files"
Scope: src/**/*.ts
Direction: higher_is_better

# Claude loops autonomously — modify, verify, keep/discard, repeat

→ Commands Reference

All available commands at a glance.

CommandDescriptionSince
/autoresearchRun the autonomous iteration loop (unlimited)v1.0.0
/loop N /autoresearchRun exactly N iterations then stopv1.0.1
/autoresearch:planInteractive wizard: Goal → Scope, Metric, Verify configv1.0.2
/autoresearch:securitySTRIDE + OWASP + red-team security auditv1.0.3
/autoresearch:security --diffDelta mode — only audit changed filesv1.0.3
/autoresearch:security --fixAuto-fix confirmed Critical/High findingsv1.0.3
/autoresearch:security --fail-onCI/CD severity gate (critical | high | medium)v1.0.3
/loop N /autoresearch:securityBounded security audit (N iterations)v1.0.3

→ How It Works

Set the goal. Start the loop. Walk away.

01

Define Goal & Metric

Tell Claude what "better" means. Pick a mechanical metric — test coverage, build time, lighthouse score, val_bpb — anything measurable.

02

Autonomous Loop

Claude makes one atomic change, commits, verifies the metric, and keeps or reverts. No human input needed between iterations.

03

Review Results

Every iteration is logged in a TSV file. Kept changes stay as git commits. You get a clean history of what worked and what didn't.

→ Features

Karpathy's autoresearch principles, generalized for any work — with planning wizard and security audit.

Security Audit (v1.0.3)

STRIDE threat model + OWASP Top 10 + 4 red-team personas. Generates structured reports with code evidence and prioritized mitigations.

Plan Wizard (v1.0.2)

Describe your goal in plain language. The wizard suggests metrics, validates your verify command with a dry-run, and outputs a ready-to-launch config.

Constraint-Driven Loop

One change per iteration. Commit before verify. Auto-revert on failure. No ambiguity in what caused what.

Mechanical Verification

No subjective "looks good." Every iteration runs a real metric — tests, benchmarks, scores, build output.

Automatic Rollback

Failed changes revert instantly via git reset. No manual cleanup, no debugging compound failures.

Git as Memory

Every kept change is a commit. The agent reads its own git history to learn what works and avoid past mistakes.

Results Logging

TSV log tracks every iteration — metric, delta, status, description. Pattern recognition across experiments.

Domain-Agnostic

Works on backend code, ML training, frontend UI, content, performance — any task with a measurable outcome.

→ Security Audit v1.0.3

Autonomous STRIDE + OWASP + red-team security audit. Generates a full threat model, maps attack surfaces, then iteratively tests each vulnerability vector with code evidence.

$ /loop 10 /autoresearch:security
Scope: src/api/**/*.ts, src/middleware/**/*.ts
Focus: authentication and authorization flows

# Setup: scan codebase → assets → trust boundaries → STRIDE model → attack surface
# Loop: test vectors → validate with code evidence → log findings → repeat
# Output: security/260315-0945-stride-owasp-full-audit/overview.md

STRIDE Threat Model

Full Spoofing, Tampering, Repudiation, Info Disclosure, DoS, and Elevation of Privilege analysis per asset and trust boundary.

OWASP Top 10 (70+ Checks)

Systematic coverage across all 10 OWASP categories. Coverage matrix tracks tested vs untested. Aims for 100% coverage.

4 Red-Team Personas

Security Adversary, Supply Chain Attacker, Insider Threat, and Infrastructure Attacker. Each drives which vectors get tested.

Structured Report Folder

Each audit creates a timestamped folder with 7 files: overview, threat model, attack surface, findings, OWASP coverage, dependency audit, and recommendations.

Flags

--diff

Only audit files changed since last audit

--fix

Auto-fix confirmed Critical/High findings

--fail-on

CI/CD gate: exit non-zero at severity threshold

# Full combo: delta audit + auto-fix + CI gate
/loop 15 /autoresearch:security --diff --fix --fail-on critical

→ Use Cases

Same loop, different domains. The principles are universal — the metrics are domain-specific.

Backend Code

Metric:
Tests pass + coverage %
Scope:
src/**/*.ts
Verify:
npm test

Frontend UI

Metric:
Lighthouse score
Scope:
src/components/**
Verify:
npx lighthouse

ML Training

Metric:
val_bpb / loss
Scope:
train.py
Verify:
uv run train.py

Performance

Metric:
Benchmark time (ms)
Scope:
Target files
Verify:
npm run bench

Refactoring

Metric:
Tests pass + LOC reduced
Scope:
Target module
Verify:
npm test && wc -l

Blog / Content

Metric:
Word count + readability
Scope:
content/*.md
Verify:
Custom script

Security Audit

Metric:
OWASP + STRIDE coverage
Scope:
src/api/**, src/middleware/**
Verify:
/autoresearch:security

→ Quick Start

Two commands to install. One command to run.

1. Install the Skill

git clone https://github.com/uditgoenka/autoresearch.git /tmp/autoresearch
cp -r /tmp/autoresearch/skills/autoresearch ~/.claude/skills/autoresearch

2. Plan Your Run (New in v1.0.2)

# Interactive wizard — builds Scope, Metric & Verify from your Goal:
/autoresearch:plan
Goal: Make the API respond faster

# The wizard scans your codebase, suggests metrics,
# dry-runs the verify command, and outputs a ready-to-paste config.

3. Run Unlimited

# Inside any project directory:
/autoresearch
Goal: Increase test coverage to 95%
Metric: npm test -- --coverage | grep "All files"
Scope: src/**/*.ts
Direction: higher_is_better

4. Run Bounded (Optional)

# Run exactly 25 iterations then stop:
/loop 25 /autoresearch
Goal: Reduce bundle size below 200KB
Metric: npm run build | grep "Total size"
Direction: lower_is_better

5. Security Audit (New in v1.0.3)

# STRIDE + OWASP + red-team security audit:
/loop 10 /autoresearch:security

# With flags: delta mode + auto-fix + CI gate:
/loop 15 /autoresearch:security --diff --fix --fail-on critical

→ Changelog

Release history and what shipped in each version.

v1.0.3Mar 15, 2026#3, #4, #5, #6

Autonomous Security Audit

  • +/autoresearch:security — STRIDE threat model + OWASP Top 10 + red-team (4 adversarial personas)
  • +--diff flag: delta mode, only audit files changed since last audit
  • +--fix flag: auto-remediate confirmed Critical/High findings
  • +--fail-on flag: CI/CD severity gate for pipeline blocking
  • +Structured report folder with 7 dedicated markdown files per audit
  • +CI/CD GitHub Action template auto-generation
  • +Historical comparison across audit runs (new/fixed/recurring)
  • +Commands Reference table added to README
v1.0.2Mar 15, 2026#2

Plan Your Run Wizard

  • +/autoresearch:plan — interactive wizard converts Goal → Scope, Metric, Verify config
  • +Mandatory dry-run validation of verify command before accepting
  • +Metric suggestion database by domain (code, performance, content, refactoring)
  • +Launch options: unlimited, bounded (/loop N), or copy-only
v1.0.1Mar 14, 2026#1

Controlled Iterations with /loop

  • +/loop N /autoresearch — run exactly N iterations then stop with summary
  • +Early completion when goal is achieved before N iterations
  • +Smart exploitation when <3 iterations remain
  • +Final summary: baseline → current best, keeps/discards/crashes
v1.0.0Mar 13, 2026

Initial Release

  • +Core autoresearch loop: modify → verify → keep/discard → repeat
  • +7 core principles from Karpathy’s autoresearch, generalized
  • +Mechanical verification, automatic rollback, git as memory
  • +TSV results logging with pattern recognition
  • +Domain-agnostic: code, ML, content, performance, refactoring

→ FAQ

Common questions about Autoresearch.

Start Iterating Autonomously

Free. Open source. Works on any task.

Let's Build Something Together

Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.

Get in Touch