Google research released the TurboQuant the technique that compress the context for LLMs. The next bottleneck in agent systems is not just model quality. It is retrieval cost, memory pressure, and how much context infrastructure starts to hurt once your system grows beyond toy demos. That is why we built TurboAgents.
TurboAgents is a Python package for compressed retrieval and KV-style optimization for agent and RAG systems. It is designed to work with the tools you already use instead of forcing you into a new framework, a new database, or a new architecture. If you already have a vector store, a retrieval layer, or an agent runtime, TurboAgents is meant to sit in that stack and make it more efficient.
TurboAgents is now public. You can find the docs here, the source on GitHub, and the package on PyPI.
Why TurboAgents Exists
Teams building RAG systems and agent workflows usually hit the same pattern:
- Retrieval quality starts to matter more as the corpus grows
- Memory cost starts to rise faster than expected
- Latency gets worse once reranking and broader search are added
- Replacing the entire stack is rarely realistic
Most people do not want a new framework. They want something that helps the current one. TurboAgents was built for that exact gap. The goal is to make compressed retrieval practical for real systems while keeping the integration surface small and predictable.
What TurboAgents Is
TurboAgents is a standalone Python package for compressed retrieval, vector reranking, KV-style optimization paths, local benchmarking and evaluation, and adapter-based integration with common vector backends.
The core capabilities:
- Compressed Retrieval – Apply compressed scoring and reranking between your embeddings and final retrieved results, improving quality without replacing your stack.
- Vector Reranking – Add a reranking layer on top of retrieved candidates from any supported vector backend for better precision.
- KV-Style Optimization – Reduce memory pressure and retrieval cost with optimization paths designed for real-world agent and RAG workloads.
- Local Benchmarking – Evaluate retrieval quality and latency tradeoffs with built-in benchmark harnesses and adapter coverage.
It is framework-agnostic infrastructure. That matters. TurboAgents is not a replacement for your orchestration layer. It is not a new agent framework. It is a performance and retrieval layer that can plug into the system you already have.
Who TurboAgents Is For
- Agent Framework Builders – Add a retrieval and compression layer under your existing abstractions. Keep your framework surface, gain a more efficient retrieval path.
- RAG Application Teams – Better memory efficiency and compressed reranking without rewriting your entire system. Drop it into the stack you already have.
- Researchers & Infra Engineers – A concrete package and benchmark harness for experimenting with retrieval quality, long-context behavior, and vector compression tradeoffs.
- Local-First AI Builders – Designed around constrained hardware and practical local setups. Keep retrieval costs manageable without sacrificing quality.
Supported Vector Backends
TurboAgents currently supports validated retrieval paths across several vector backends. Keep your current storage choice and still use TurboAgents as a compressed retrieval and reranking layer.
- Chroma
- FAISS
- LanceDB
- pgvector
- SurrealDB
How TurboAgents Works
At a high level, TurboAgents sits between your embeddings and your final retrieved results. A simple mental model:
- Your vector database returns candidate matches
- TurboAgents applies compressed scoring and reranking
- Your agent or RAG pipeline receives the final results
That sounds simple because it should be simple. The point is not to replace the entire retrieval system. The point is to improve the part that becomes expensive as your system scales.
In practice, TurboAgents can be used as a backend adapter for a supported vector store, as a reranking layer on top of retrieved candidates, as part of an end-to-end framework integration, or as a benchmarkable retrieval surface for evaluating quality and latency tradeoffs.
What Makes TurboAgents Different
A lot of infrastructure projects try to win by asking you to adopt a new stack. TurboAgents takes the opposite approach.
The design principle: keep your framework, keep your vector database, add TurboAgents where retrieval cost and memory pressure start to hurt. That makes adoption much more realistic. It also means TurboAgents can be used incrementally. You do not need a giant migration plan to try it.
Benchmarks and Validation
We did not want TurboAgents to be just a packaging exercise. It needed real validation. The current benchmark coverage includes adapter benchmarks, MLX benchmark runs, pgvector validation, Chroma benchmark coverage, a minimal long-context Needle-style evaluation path, and checked-in benchmark harnesses and summaries.
Current results:
- Chroma and FAISS both performed strongly on the validated adapter sweep
- pgvector showed a credible higher-bit path
- LanceDB, SurrealDB, and other adapters now have real integration and validation coverage
- The long-context path is intentionally documented honestly rather than overclaimed
Getting Started
TurboAgents is built around a simple install path. Check the getting started guide for the full walkthrough.
- Core install:
uv add turboagents - Retrieval extras:
uv add "turboagents[rag]" - MLX extras:
uv add "turboagents[mlx]"
Watch the Demo
See TurboAgents in action – installation, CLI benchmarks, adapter integration, and compressed retrieval walkthrough. You can follow along with the turboagents-demo repo.
TurboAgents and SuperOptiX
TurboAgents is a standalone library, but SuperOptiX is the first full reference integration. That is an important distinction. TurboAgents is not tied to SuperOptiX, but SuperOptiX is where we have already integrated it end to end in a real agent framework environment. That makes SuperOptiX the clearest proof that TurboAgents is useful beyond isolated examples.
SuperOptiX now supports TurboAgents-backed retrieval paths including turboagents-chroma, turboagents-lancedb, and turboagents-surrealdb. This means TurboAgents is already wired into a broader agent optimization and orchestration environment, not just exposed as raw backend code.
Why this matters: TurboAgents can operate as standalone infrastructure. TurboAgents can also power retrieval paths inside a real agent system. Adoption does not need to start with framework-by-framework upstream contributions. One strong reference integration is more valuable than a dozen shallow adapters.
Explore the SuperOptiX TurboAgents integration page, the SuperOptiX GitHub repo, or the SuperOptiX PyPI package.
What You Can Do with TurboAgents Today
- Install TurboAgents and run your first benchmark in minutes
- Try compressed retrieval against your existing Chroma, FAISS, LanceDB, pgvector, or SurrealDB setup
- Evaluate retrieval quality and latency tradeoffs with the built-in CLI harness
- Integrate with SuperOptiX for end-to-end agent framework compressed retrieval
- Explore MLX-based local inference with KV-cache compression
