What OpenClaw Vs Anthropic Drama Taught Us: The Urgent Need for Self-Optimizing Harness Engineering

Recently, OpenClaw took off like a one of the greatest breakthrough in the AI. People are going crazy to setup OpenClaw to automate the tasks. All looked very good until recetly when Anthropic cut off access to Claude models for OpenClaw. Anthropic changed the rules for Claude Code subscribers. Third-party tools such as OpenClaw can no longer use included subscription tokens. Users must switch to pay-as-you-go API pricing or move to Anthropic’s own first-party tools. A one-month credit was offered to existing users, but the cutoff took effect on April 4 2026. The reaction across all social platform was immediate and intense. Power users who had built complex autonomous workflows suddenly faced costs that made their setups unsustainable. Many tried swapping in GPT models, Gemini, or local open-source alternatives. The results exposed a deeper problem.

Why OpenClaw Didn’t work on Other Models?

OpenClaw’s entire architecture had been tuned specifically for Claude. Its prompting strategies, context management, tool-calling loops, memory compaction, and orchestration logic all assumed Claude’s particular strengths. When those assumptions broke, performance collapsed. Tasks that once ran smoothly now failed or hallucinated heavily. The fragility was not a bug. It was the direct result of hyper-optimization for a single closed-source model. There are some indication of this

OpenClaw is built with Typescript, there Claude models are absolutely stunning to deal with the TypeScript
Harness Optimization packages and libraries doesn’t exist in TypeScript so once harness has built it stays static
There were no Prompt, Context optimization strategies mentioned in the OpenClaw that can be optimized automatically for any models.
The responsibility seems like handed over to the models reasoning than the actual harnesses itself. Models takes control over harness.

Hence, OpenClaw seems to work amazing on the Claude Models but seems to suck at the other model including the GPT models from OpenAI where the author currently work.

The TypeScript versus Python Reality Check: OpenClaw vs Hermes Agents

OpenClaw was written in TypeScript which lived deep inside the Anthropic ecosystem. That choice delivered excellent results while Claude remained the top performer for coding and reasoning. The same harness felt clumsy and “off” when users tried GPT or Codex models. OpenAI’s models come from a Python-native lineage with different tool-use patterns and code-generation behaviors. Switching required far more than a simple API key change. It demanded fundamental adjustments to the harness itself.

At the same time, Hermes Agent and similar Python-based frameworks began gaining traction. Their architecture aligns more naturally with GPT workflows, persistent memory loops, and self-improving evaluation layers. Many users started calling these alternatives “it just works” options because model switching required far less rework. The contrast highlighted a structural tension inside OpenAI itself. The company had acquired OpenClaw’s TypeScript codebase through the creator’s hiring while also owning deep Python tooling via other moves. Reconciling those two worlds became an internal priority. OpenAI acquired Astral, the python tooling startup which show they are focusing on Python ecosystem not TypeScript.

The Open-Source versus Closed-Source Debate Ignites

The pricing change triggered a loud and immediate debate on social media about open source versus closed source. Many users argued that the moment proved why open source must win. They urged OpenClaw to focus on local models and fully self-hosted setups. One post captured the sentiment directly: “If we ‘never bet against open source’, then it means OpenClaw should simply be used with local models. Others declared that Anthropic’s move had accidentally accelerated hybrid architectures where cheap local models handle execution while a stronger cloud model orchestrates. Several users announced they would now invest time in local-model hosting rather than pay Anthropic’s new rates.

The debate carries an obvious contradiction. OpenClaw’s creator, Peter Steinberger, recently joined OpenAI. Multiple posts note that he had publicly criticized Anthropic and accused the company of copying features from OpenClaw before restricting access. One widely shared thread described the timing as “Anthropic just cut off Claude Code subscribers from third-party tools like OpenClaw. No migration path. Pay-as-you-go or nothing. OpenClaw’s creator joined OpenAI. Anthropic moved on pricing the same week. Timing writes its own story.” Another highlighted Steinberger’s earlier statements that Anthropic had copied features into its closed tool before locking out the open-source alternative.

Anthropic, OpenAI, OpenClaw

For the open-source community that had embraced OpenClaw, the situation feels equally contradictory. The framework is positioned as open and customizable, yet its creator now works at a leading closed-source lab. Users who favor local models still face a harsh reality: today’s best open-source models lag significantly in reliable tool calling, long-horizon reasoning, and consistent multi-step orchestration. Hybrid experiments are growing, but pure local performance remains a work in progress. Meanwhile, making GPT models perform well inside OpenClaw requires substantial engineering effort. Some observers suggest the entire harness might eventually need a port to Python to align with OpenAI’s strengths. The debate on X has therefore split into two camps: one demanding faster open-source integration, the other acknowledging that closed-source models still deliver the highest immediate capability.

Anthropic’s defenders point out that the company is simply enforcing its terms and protecting its infrastructure from subsidized usage patterns that grew faster than expected. They are not blocking API access entirely. The change forces third-party tools to pay realistic rates. Critics counter that the timing and the simultaneous rollout of native Claude Code features (recurring prompts, scheduled tasks, persistent memory, remote control) look like a deliberate shift toward first-party lock-in. Either way, the episode has made one fact undeniable: relying on a single provider’s goodwill is risky when your entire product depends on that provider’s model.

Has Anthropic done anything Wrong? Or Is Peter Playing Double Game?

Has Anthropic done anything wrong in this situation? I do not think so. Anthropic did not do anything wrong here. They are simply trying to stop the abuse of their most powerful models, which they produce for the entire world. They did not block OpenClaw directly. People can still use OpenClaw with Claude models by switching to the official API and paying the standard usage rates. Previously, many users were heavily abusing the fixed-price Claude subscription to power intensive agent workflows in OpenClaw far beyond what the subscription was designed for. This became especially relevant as OpenClaw’s author, Peter Steinberger, joined OpenAI, a direct competitor, and began actively promoting GPT and Codex models while making critical comments about Anthropic and Claude Code. From Anthropic’s perspective, this was a reasonable competitive response to protect their business model. Steinberger first used Claude Code and switch to Codex (he claimed), it came immediately after Anthropic blocked Open Code and Previous OpenClaw ClaudeBot. He was also running meetup on Claude Code so that proved he was active Claude Code user and his switch to Codex is understandable after Anthopic blocked access to ClawdBot.
This situation feels like a double game: keeping OpenClaw positioned as open source while aligning closely with a closed-source lab. If the goal was true independence, staying neutral rather than joining OpenAI would have been a clearer path. In the end, Anthropic simply played their game. Claims that they “blocked” access seem designed more to generate sympathy and attention than to reflect the reality that API access remains fully available, at market rates. I hope this move encourages healthier, more sustainable practices across the agent ecosystem.

OpenClaw’s Fork in the Road: Port OpenClaw to Python

OpenClaw now faces two clear but difficult paths. The first is deep integration with GPT and other OpenAI models. Steinberger’s new role inside OpenAI may accelerate that work, but it still requires rewriting large sections of the TypeScript harness. The second path is full support for local and open-source models. That route demands solving the current gaps in tool-calling reliability and reasoning depth. Neither option is trivial, and the window for execution is narrow. Competing agent frameworks continue to ship new capabilities every week.

One of the clear option is port the OpenClaw to Python and make it compatible to the OpenAI ecosystem. Porting OpenClaw will take benefit from the great tools like DSPy, GEPA, ACE,,meteaharness and other recent breakthrough from the AI/ML world in the coming future. This will align with OpenAI’s Python ecosystem and vision to continue shipping amazing Python and Rust based tooling.

Other option is to bet on the open source models in the reasoning and tool calling capabilities which might be closer but still the static harness won’t work for all the model family and providers. Self-Optimizing harness is a need not an option in any case.

How Is OpenClaw Getting Fixed? Recent Reactions

In response to the widespread user frustration, Peter Steinberger has been actively working on GPT integration. On April 5 2026 he posted that he “made GPT really good today and switched.” He noted that Claude Opus is still funnier but that GPT is now more reliable. In the same conversation he confirmed he had improved the harness, prompts, message tracking, and parsing logic. He encouraged users to test the changes immediately by running openclaw update –channel dev and asked for feedback directly in the thread. When users reported issues (such as execution failures or case-sensitivity problems with file systems), Steinberger responded quickly with further tweaks and debugging suggestions like enabling /verbose mode. He also addressed personality complaints by stating he had already fixed them.

These are meaningful short-term improvements. Users who update to the dev channel are already seeing better GPT performance, and the community appreciates the rapid iteration.

However, these fixes are not enough. Steinberger is currently optimizing the harness and prompts specifically for the current GPT model. When the next GPT release or a new open-source model is used inside OpenClaw, the entire harness will likely break again. Simply patching the prompt and tweaking the orchestration layer for one model at a time is not a sustainable solution. What OpenClaw needs is a truly self-optimizing harness that automatically detects model capabilities and adapts its prompting, context handling, tool routing, and evaluation logic without manual intervention. The current work is a patch, not a fix. Users should be aware that this can break anytime a new model releases. Harness optimization remains the real key.

Harness Engineering Is the Real Moat

The clearest lesson from this entire episode is that harness engineering matters more than any single model. A harness is not just glue code. It is the orchestration layer that turns raw model intelligence into reliable, repeatable work: context management, evaluation loops, memory compaction, tool routing, error recovery, and lifecycle handling. When that layer is tuned exclusively for one model’s quirks, the entire system becomes brittle the moment that model’s economics or availability change.

The winning architectures of the next twelve months will treat models as swappable engines rather than fixed chassis. They will include pluggable adapters that auto-detect model capabilities and adjust prompting, caching, and evaluation automatically. They will separate generator and evaluator roles. They will manage context intelligently instead of blindly compressing it. Research into meta-harnesses and natural-language agent frameworks is already moving in exactly this direction. Users on X repeatedly emphasize the same point: the harness itself is the product.

That insight extends beyond any one framework. Builders should not tie their workflows permanently to OpenClaw, Hermes, LangChain, or any other single tool. Instead they should learn harness engineering fundamentals so they can assemble and optimize their own pipelines. When a new model arrives with better reasoning, faster tool calling, or longer context, the harness should adopt those capabilities within days, not months. The ability to switch without rewriting core logic is the new competitive advantage.

OpenClaw did not fail for optimization because it was poorly built. It succeeded so well with Claude that it became dangerously dependent on one model family. That dependency just became expensive. The next generation of agent tools will avoid the same trap. They will be ready for whatever model, Claude, GPT, Gemini, Llama, or the next surprise, drops next month. Because in the agent era the harness is not infrastructure. The harness is the product. And the best harnesses will never bet on any single king. They will be engineered to welcome the next one.

If you are really interested in the Harness Engineering then, you should sign up for the event “Harness Engineering: State of the Art in Agent Harnesses” in San Francisco dedicated to the Harness Engineering and stay up to date. Speakers will be announced soon. Stay Tuned..
Harness Engineering is here to stay longer! Keep Harness Engineering..

Introducing SuperOpt: Research on Agentic Environment Optimization for Autonomous AI Agents

Superagentic AI Blog

Full Stack Agentic AI, Agent Optimization, Agent Engineering and Agent Experience.