Agentic coding is evolving rapidly, reshaping how developers interact with AI to generate code. Instead of being locked inside full-blown IDEs, many are moving back toward lightweight, flexible command-line interfaces. Since the arrival of Claude Code, we’ve seen a wave of new coding CLIs, Gemini CLI, Qwen Code, and others, but each has come with a major limitation: they are tied to a single model provider.
Codex CLI breaks that pattern. It’s the first CLI designed to be truly universal, capable of running any model cloud-based or open-source, local or remote through a single, unified interface. No more juggling separate CLIs or switching mental contexts depending on the model you want to use. There might be some toy open source project doing something like this but this is one from big model and official CLI that allows developers to achieve this. With Codex CLI, you configure providers once and seamlessly switch between them with simple providers, profiles or mcp servers. This is still early stage but it opens up lot of possibilities for Agentic Coding in near future.
Codex CLI
Codex CLI is OpenAI’s bold response to the wave of coding assistants like Claude Code and Gemini CLI. OpenAI describes it as one agent for everywhere you code and that vision shows. With a single installation, you get a lightweight yet powerful CLI that brings AI coding directly into your terminal.
Installation is straightforward:
- If you have Node.js installed, run:
npm i -g @openai/codex
- On macOS, you can also use Homebrew:
brew install codex
Once installed, you’re ready to go. Simply navigate to any project directory and launch
codex
From there, Codex CLI integrates seamlessly into your workflow, providing an AI assistant without needing an IDE or browser-based environment.
Cloud Models vs. Open Source Models
OpenAI has recently released two open-source models: GPT-OSS-20B and GPT-OSS-120B alongside GPT-5. By default, Codex CLI connects to cloud model like GPT-5. These are great for rapid prototyping, but they also come with tradeoffs: API costs, usage limits, and the need for a constant internet connection.
The real breakthrough is that Codex also supports open-source, self-hosted models. With the --oss
flag or a configured profile, you can run inference locally through providers like Ollama, LM Studio, or MLX.
You can launch Codec CLI with OSS flag that will by default use Ollama to check if you gpt-oss-20b model installed or not
codex --oss
You can follow the steps there start using the gpt-oss-20b
model from there or alternatively if you can pass the gpt-oss-120b
if you can run it locally using the model flag
codex --oss -m gpt-oss:120b
Now that you are already using the models from your local machines. You can also use any local model with provider and profiles that we will cover below. This brings significant advantages:
-
Run powerful LLMs locally without sending data to external servers
-
Avoid vendor lock-in by swapping providers or models at will
-
Optimize for privacy, speed, and cost while keeping workflows flexible
In short, Codex gives developers the freedom to choose between cutting-edge cloud models and locally hosted OSS models all from the same CLI.
Configuring Codex with config.toml
When you install Codec CLI, you will file the `~/.codex/`
directory on your system. The directory has different files and subdirectory but it may or may not have `~/.codex/config.toml`
file. If you don’t have this file, you need to create this file in order to configure the Codex CLI with different providers and profiles. The config file allows you to configure any provider and allow you to create profile for the models. You can find various options that probably not documented yet but you can gran them from the direct source code of the codex. You can also configure the MCP servers in this file.
Ollama Configuration
Assuming you have model already downloaded and Ollama is running. Here’s how to configure Codex CLI with different providers. Add these sections to your ~/.codex/config.toml
file:
[model_providers.ollama] name = "Ollama" base_url = "http://localhost:11434/v1" [profiles.gpt-oss-120b-ollama] model_provider = "ollama" model = "gpt-oss:120b"
You can change the model and profile names with you model of choice that you can get it from the Ollama list
command. Once you the file with this changes you can launch codex with profile
codex --oss --profile gpt-oss-120b-ollama
Remember the `gpt-oss-120b ` is the name of the profile from the config file.
LM Studio Configuration
In the LM Studio, You need to load the model and start the server. The default port used byLM Studio is 1234 but you can customise if needed. You can load and start model server in LM Studio using the LM Studio Or the lms
CLI. The command lms ls
will list all the models downloaded with LM Studio. You can load the model and start the server using following commands.
# Load the model e.g qwen/qwen3-coder-30b
lms load qwen/qwen3-coder-30b #Start the Server lms server start
You can also set the context length while loading the model based on the project size. Once, you have loaded the model and started server, you can update the codex configuration as below
- For gpt-oss-120b
[model_providers.lms] name = "LM Studio" base_url = "http://localhost:1234/v1" [profiles.gpt-oss-120b-lms] model_provider = "lms" model = "gpt-oss:120b"
- For qwen3-coder-30b
[model_providers.lm_studio] name = "LM Studio" base_url = "http://localhost:1234/v1" [profiles.qwen3-coder-30b-lms] model_provider = "lm_studio" model = "qwen/qwen3-coder-30b"
After that you can launch the codex with relevant profile name. codex --profile gpt-oss-120b-lms
or codex --profile qwen3-coder-30b-lms
MLX Configuration
On Apple Silicon machines, you can use MLX for faster inference. You can download the models and start the server using mlx-lm
package. You need to install this package using
pip install mlx-lm
After successful installation, you can start the local server using following command. Let’s Stanton port 888 and use GPT-OSS-model by Superagentic AI
mlx_lm.server --model SuperagenticAI/gpt-oss-20b-8bit-mlx --port 8888
Now you can update the Codex Config
[model_providers.mlx] name = "MLX LM" base_url = "http://localhost:8888/v1" [profiles.gpt-oss-20b-8bit-mlx model_provider = "mlx" model = "SuperagenticAI/gpt-oss-20b-8bit-mlx"
After that you can launch the Codex CLI with mlx profile codex --profile gpt-oss-20b-8bit-mlx
This will route your Codex CLI requests to the MLX provider and run the specified local model.
Watch It in Action
Context Legnth
One of the issue with using local coding models I context length, we need to manually tweak the context length if the repo is bigger. Yon can find settings for changing context size for Ollama, LM Studio and MLX independently. Ollama has /set parameter num_ctx
to change the context of running model. LM Studio, you can pass context-length
to the lms load
command.
Why Run Local Models?
While cloud APIs are convenient, local models bring unique benefits:
-
Privacy: Your code never leaves your machine.
-
Cost control: No API bills for long-running tasks.
-
Flexibility: Swap models in/out without waiting for API support.
-
Resilience: Works offline or in restricted environments.
By combining Codex CLI with local providers like Ollama, LM Studio, and MLX, you get the best of both worlds: a unified developer experience with full freedom to choose between cloud and local inference.
Final Thoughts
Codex CLI marks a shift in how developers interact with AI coding models. For the first time, you can use one CLI to manage all your models from OpenAI’s cloud APIs to cutting-edge OSS models running locally on your machine. If you’re serious about building with AI while keeping flexibility, privacy, and cost in check, it’s worth setting up Codex CLI with local providers today.