Stanford IRIS Lab officially released the reference code for Meta-Harness, their groundbreaking framework for autonomously optimizing the code scaffolding around a fixed large language model. The announcement quickly gained traction across social media, with builders praising the clean ONBOARDING.md workflow and the promise of applying the technique to entirely new domains.
Superagentic AI has been preparing for this day since we open-sourced our own implementation on April 2. Today we are excited to release metaharness v0.2.0 the full official-alignment update that turns the research reference into a polished, installable engine you can start using immediately.
This release is not a port: we kept everything that already made metaharness-friendly, the CLI, filesystem run store with snapshots, write-scope enforcement, experiment matrix support, and strong Codex-first integration, while systematically adopting the strongest architectural ideas from the official Stanford release.
What Meta-Harness Means and Why It Matters
At its core, Meta-Harness flips the usual optimization target. Instead of fine-tuning or prompting the model itself, you treat the entire harness the surrounding code that handles memory, retrieval, validation, tool routing, setup scripts, and evaluation logic as the thing that evolves. A proposer agent iteratively rewrites harness files, evaluates changes on search splits, promotes strong candidates to held-out test sets, and builds a frontier of high-performing variants. This delivers dramatically better task-specific performance without touching the base model.
The official Stanford repo provides an excellent research reference and two worked examples, along with a conversational onboarding flow that makes it easy to bootstrap new domains. Our metaharness v0.2.0 complements that by delivering a fully packaged, provider-neutral runtime that teams can install and run today.
What’s New in v0.2.0
We completed every phase of the alignment plan we outlined internally. The result is a much more powerful yet still familiar library.We added a clean DomainSpec system and a new metaharness onboard command that mirrors the official ONBOARDING.md experience. You can now define your evaluation unit, search and test splits, metrics, budget, and leakage protections through a guided conversation with your coding agent.
The architecture now revolves around a generalized DomainAdapter protocol. This makes it straightforward to plug in custom validation, search-stage evaluation, held-out test evaluation, and secondary metrics. Our existing coding-tool harnesses have been cleanly converted into the first adapter, so all previous workflows remain fully backward-compatible.
Evaluation now properly separates search and test stages to prevent leakage, exactly as the official examples recommend. We also introduced frontier-based selection policies that support single-objective maximization, lexicographic ordering, and Pareto optimization across accuracy, cost, context length, and latency. Batch proposal support lets the engine explore multiple candidates per iteration when you want it, while the simple single-candidate hill-climbing mode you already know is still there by default.
Telemetry received a major upgrade too. ProposalResult now captures detailed token usage, cost tracking, file read/write summaries, tool-call traces, and richer session metadata inspired by the structured logging in the official Claude wrappers but kept fully provider-neutral.
Finally, we included a lightweight reference domain modeled after the official text-classification example, so you can see the full pattern in action without heavy external dependencies. You can read the complete release notes here:
How the Two Repos Work Together
The official Stanford repository shines as a research foundation and domain bootstrapping toolkit. It excels at helping you define new problems and replicate the paper experiments. Our metaharness library complements it perfectly by providing the production runtime: a single installable package with a stable CLI, battle-tested Codex and local Ollama support, inspect and ledger commands, and filesystem persistence that survives long optimization runs.
Together they form a complete picture research-grade concepts paired with shipping-grade tooling.
Get Started in Under a Minute
# Install via uv (recommended)
uv tool install superagentic-metaharness
# Try the built-in example
metaharness run examples/python_fixture_benchmark --backend fake --budget 5
# Or start a brand-new domain
metaharness onboard
What Comes Next
The age of manually hand-tuning harnesses is ending. The age of self-optimizing, inspectable, frontier-driven harnesses is here and it is fully open source.Drop your harness into metaharness, let the outer loop run, and watch it improve. We can’t wait to hear what results you get.
Full documentation is live at
https://superagenticai.github.io/metaharness/ Repository: https://github.com/SuperagenticAI/metaharness
