Code Mode is getting real momentum across the MCP ecosystem. Cloudflare has introduced a Code Mode MCP
path, and in parallel we can already run Code Mode-style workflows in RLM Code for controlled
experiments and benchmark-driven comparisons.
If you are new to the concept, Code Mode is a tooling pattern where an agent plans and executes coding
tasks through a structured tool contract instead of relying only on long conversational context. In
practice, this means reproducible runs, clearer tool traces, and better benchmarking discipline across
different MCP backends.
This post is written as a research-first setup guide: one backend that matches the current Code Mode
contract directly, and one remote backend that you can still evaluate safely with a compatible strategy.
Execution Contract and Backend Strategy
After publishing the RLM Code release and recording a live demo, the most common question we received was
simple: can we run one Code Mode workflow across different MCP backends and still benchmark it rigorously?</ p>
The answer is yes, with one important detail about tool contracts in the current release.
- UTCP local MCP (@utcp/code-mode-mcp) with
strategy=codemode - Cloudflare remote MCP (Cloudflare announcement) with
strategy=tool_call</ code>
Recommended flow (TL;DR)
- Connect both MCP servers and verify tool visibility with
/mcp-tools. - Run UTCP jobs with
strategy=codemodefor native bridge compatibility. - Run Cloudflare jobs with
strategy=tool_callwhen tool names differ from the Code Mode
contract. - Use
rlm bench compareandrlm bench reportfor side-by-side artifact
review.
Quick takeaways: Use strategy=codemode with UTCP, use
strategy=tool_call with Cloudflare in this release, and benchmark both with identical prompts
and steps so your compare output stays defensible.
Demo video
YouTube: Code Mode in RLM Code
Quick strategy matrix
| Backend | Strategy in this release | Why |
|---|---|---|
| UTCP Code Mode MCP | codemode |
Matches current bridge contract expected by harness |
| Cloudflare remote MCP | tool_call |
Tool surface can differ from current Code Mode bridge names |
Configuration
Add both MCP servers in your project rlm_config.yaml. You can cross-check config and command
conventions in the RLM Code docs.
UTCP local bridge
mcp_servers:
utcp-codemode:
name: utcp-codemode
description: "Local Code Mode MCP bridge"
enabled: true
auto_connect: false
timeout_seconds: 30
retry_attempts: 3
transport:
type: stdio
command: npx
args:
- "@utcp/code-mode-mcp"
Cloudflare remote bridge
mcp_servers:
cloudflare-codemode:
name: cloudflare-codemode
description: "Cloudflare MCP via remote bridge"
enabled: true
auto_connect: false
timeout_seconds: 30
retry_attempts: 3
transport:
type: stdio
command: npx
args:
- "mcp-remote"
- "https://mcp.cloudflare.com/mcp"
Cloudflare note: on first connect, mcp-remote can request interactive authentication if you
are not already logged in. Complete auth once, then reconnect. Package link: mcp-remote.
Demo commands (steps=3)
UTCP with Code Mode
/mcp-connect utcp-codemode
/mcp-tools utcp-codemode
/harness run "analyze this repo, find TODO/FIXME, and create report.json" steps=3 mcp=on strategy=codemode
mcp_server=utcp-codemode
Cloudflare with tool_call
/mcp-connect cloudflare-codemode
/mcp-tools cloudflare-codemode
/harness run "list available tools and run one safe read-only action, then summarize in 3 bullets" steps=3
mcp=on strategy=tool_call mcp_server=cloudflare-codemode
Research compare workflow
/rlm bench preset=generic_smoke mode=harness strategy=codemode mcp=on mcp_server=utcp-codemode
limit=1 steps=3
/rlm bench preset=generic_smoke mode=harness strategy=tool_call mcp=on mcp_server=cloudflare-codemode
limit=1 steps=3
/rlm bench compare candidate=latest baseline=previous
/rlm bench report candidate=latest baseline=previous format=markdown
How Code Mode works (technical architecture)
At a high level, Code Mode in RLM Code is a harness strategy on top of MCP. The architecture has three
layers:
- Harness layer: task orchestration, prompting, guardrails, telemetry.
- MCP bridge contract: tools exposed to the harness.
- Provider implementation: UTCP, Cloudflare, or custom server runtime.
In this release, strategy=codemode expects the bridge tools search_tools and
call_tool_chain. UTCP exposes this contract directly, so it runs natively. Cloudflare can
expose a different tool naming surface, so we use tool_call there today. Repo: SuperagenticAI/
rlm-code.
Guardrails and sandbox responsibilities
RLM Code is responsible for planner guardrails and harness-level controls. MCP providers are responsible
for their own runtime execution boundaries. For research quality and safer iterations, keep strict sandbox
posture on the RLM side and run deterministic benchmark presets.
Why Cloudflare may show “could not resolve call_tool_chain/search_tools”
That error means the selected server does not expose the exact tool names required by the current Code
Mode strategy. It does not mean Cloudflare MCP is broken. It means there is a bridge-name mismatch for this
release contract.
Practical fix: keep Cloudflare runs on strategy=tool_call and keep UTCP runs on
strategy=codemode until a dedicated Cloudflare Code Mode strategy is added.
Troubleshooting checklist
- Confirm the active server with
/mcp-tools <server-name>before launching harness
runs. - Re-run Cloudflare auth if
mcp-remoteprompts or stalls on first connect. - Keep the same task prompt, preset, and
stepsacross both runs to avoid noisy benchmark
deltas. - Store generated reports in versioned artifacts so baseline/candidate comparisons stay reproducible.</ li>
Research and benchmark possibilities
- Run the same preset across multiple MCP backends to isolate tool-surface effects.
- Compare strategy cost and completion behavior under fixed
stepsand fixed prompts. - Track regression gates over time with
bench compareandbench validate. - Keep benchmark artifacts for reproducibility and paper-style reporting.
Links
Final takeaway
You can demonstrate both approaches in one workflow today. UTCP gives you native Code Mode in the current
RLM release. Cloudflare gives you a strong remote MCP path with tool_call. Together they form a
practical benchmark matrix for real research and release decisions.
