Codex CLI: Running GPT-OSS and Local Coding Models with Ollama, LM Studio, and MLX

Agentic coding is evolving rapidly, reshaping how developers interact with AI to generate code. Instead of being locked inside full-blown IDEs, many are moving back toward lightweight, flexible command-line interfaces. Since the arrival of Claude Code, we’ve seen a wave of new coding CLIs, Gemini CLI, Qwen Code, and others, but each has come with a major limitation: they are tied to a single model provider.

Codex CLI breaks that pattern. It’s the first CLI designed to be truly universal, capable of running any model cloud-based or open-source, local or remote through a single, unified interface. No more juggling separate CLIs or switching mental contexts depending on the model you want to use. There might be some toy open source project doing something like this but this is one from big model and official CLI that allows developers to achieve this. With Codex CLI, you configure providers once and seamlessly switch between them with simple providers, profiles or mcp servers. This is still early stage but it opens up lot of possibilities for Agentic Coding in near future.

Codex CLI

Codex CLI is OpenAI’s bold response to the wave of coding assistants like Claude Code and Gemini CLI. OpenAI describes it as one agent for everywhere you code and that vision shows. With a single installation, you get a lightweight yet powerful CLI that brings AI coding directly into your terminal.

Installation is straightforward:

If you have Node.js installed, run:

npm i -g @openai/codex

On macOS, you can also use Homebrew:

brew install codex

Once installed, you’re ready to go. Simply navigate to any project directory and launch

codex

From there, Codex CLI integrates seamlessly into your workflow, providing an AI assistant without needing an IDE or browser-based environment.

Cloud Models vs. Open Source Models

OpenAI has recently released two open-source models: GPT-OSS-20B and GPT-OSS-120B alongside GPT-5. By default, Codex CLI connects to cloud model like GPT-5. These are great for rapid prototyping, but they also come with tradeoffs: API costs, usage limits, and the need for a constant internet connection.

The real breakthrough is that Codex also supports open-source, self-hosted models. With the --oss flag or a configured profile, you can run inference locally through providers like Ollama, LM Studio, or MLX.

You can launch Codec CLI with OSS flag that will by default use Ollama to check if you gpt-oss-20b model installed or not

codex --oss

You can follow the steps there start using the gpt-oss-20b model from there or alternatively if you can pass the gpt-oss-120b if you can run it locally using the model flag

codex --oss -m gpt-oss:120b

Now that you are already using the models from your local machines. You can also use any local model with provider and profiles that we will cover below. This brings significant advantages:

Run powerful LLMs locally without sending data to external servers
Avoid vendor lock-in by swapping providers or models at will
Optimize for privacy, speed, and cost while keeping workflows flexible

In short, Codex gives developers the freedom to choose between cutting-edge cloud models and locally hosted OSS models all from the same CLI.

Configuring Codex with config.toml

When you install Codec CLI, you will file the `~/.codex/` directory on your system. The directory has different files and subdirectory but it may or may not have `~/.codex/config.toml` file. If you don’t have this file, you need to create this file in order to configure the Codex CLI with different providers and profiles. The config file allows you to configure any provider and allow you to create profile for the models. You can find various options that probably not documented yet but you can gran them from the direct source code of the codex. You can also configure the MCP servers in this file.

Ollama Configuration

Assuming you have model already downloaded and Ollama is running. Here’s how to configure Codex CLI with different providers. Add these sections to your ~/.codex/config.toml file:

[model_providers.ollama]
name = "Ollama"
base_url = "http://localhost:11434/v1"

[profiles.gpt-oss-120b-ollama]
model_provider = "ollama"
model = "gpt-oss:120b"

You can change the model and profile names with you model of choice that you can get it from the Ollama list command. Once you the file with this changes you can launch codex with profile

codex --oss --profile gpt-oss-120b-ollama

Remember the `gpt-oss-120b ` is the name of the profile from the config file.

LM Studio Configuration

In the LM Studio, You need to load the model and start the server. The default port used byLM Studio is 1234 but you can customise if needed. You can load and start model server in LM Studio using the LM Studio Or the lms CLI. The command lms ls will list all the models downloaded with LM Studio. You can load the model and start the server using following commands.

# Load the model e.g  qwen/qwen3-coder-30b

lms load qwen/qwen3-coder-30b

#Start the Server
lms server start

You can also set the context length while loading the model based on the project size. Once, you have loaded the model and started server, you can update the codex configuration as below

For gpt-oss-120b

[model_providers.lms]
name = "LM Studio"
base_url = "http://localhost:1234/v1"

[profiles.gpt-oss-120b-lms]
model_provider = "lms"
model = "gpt-oss:120b"

For qwen3-coder-30b

[model_providers.lm_studio]
name = "LM Studio"
base_url = "http://localhost:1234/v1"

[profiles.qwen3-coder-30b-lms]
model_provider = "lm_studio"
model = "qwen/qwen3-coder-30b"

After that you can launch the codex with relevant profile name. codex --profile gpt-oss-120b-lms or codex --profile qwen3-coder-30b-lms

MLX Configuration

On Apple Silicon machines, you can use MLX for faster inference. You can download the models and start the server using mlx-lm package. You need to install this package using

pip install mlx-lm

After successful installation, you can start the local server using following command. Let’s Stanton port 888 and use GPT-OSS-model by Superagentic AI

mlx_lm.server --model SuperagenticAI/gpt-oss-20b-8bit-mlx --port 8888

Now you can update the Codex Config

[model_providers.mlx]
name = "MLX LM"
base_url = "http://localhost:8888/v1"

[profiles.gpt-oss-20b-8bit-mlx
model_provider = "mlx"
model = "SuperagenticAI/gpt-oss-20b-8bit-mlx"

After that you can launch the Codex CLI with mlx profile codex --profile gpt-oss-20b-8bit-mlx

This will route your Codex CLI requests to the MLX provider and run the specified local model.

Watch It in Action

Context Legnth

One of the issue with using local coding models I context length, we need to manually tweak the context length if the repo is bigger. Yon can find settings for changing context size for Ollama, LM Studio and MLX independently. Ollama has /set parameter num_ctx to change the context of running model. LM Studio, you can pass context-length to the lms load command.

Why Run Local Models?

While cloud APIs are convenient, local models bring unique benefits:

Privacy: Your code never leaves your machine.
Cost control: No API bills for long-running tasks.
Flexibility: Swap models in/out without waiting for API support.
Resilience: Works offline or in restricted environments.

By combining Codex CLI with local providers like Ollama, LM Studio, and MLX, you get the best of both worlds: a unified developer experience with full freedom to choose between cloud and local inference.

Final Thoughts

Codex CLI marks a shift in how developers interact with AI coding models. For the first time, you can use one CLI to manage all your models from OpenAI’s cloud APIs to cutting-edge OSS models running locally on your machine. If you’re serious about building with AI while keeping flexibility, privacy, and cost in check, it’s worth setting up Codex CLI with local providers today.

Introducing Forward Deployed Agents: A New Business Model for the Agentic AI Era

Superagentic AI Blog

Full Stack Agentic AI and Agent Optimization for production grade AI Agents