Milvus is not a database you spin up for a weekend hackathon; it is a distributed infrastructure designed to store billions of vectors without choking. While tools like Chroma or Pinecone prioritize developer velocity, Milvus prioritizes architectural scalability, separating storage, compute, and indexing into distinct microservices that can scale independently on Kubernetes. If you are building the next eBay (a known user) or a massive enterprise RAG system, this is your tool.
For a realistic workload of 10 million 768-dimension vectors, the managed Zilliz Cloud (serverless) charges roughly $30 for the initial ingestion (based on vCU usage) and about $0.025/GB for storage. However, the real cost is in retrieval. A high-throughput application running 5 million queries a month could easily rack up $300+ in "Virtual Compute Units" (vCUs), comparable to Pinecone’s Standard tier but offering significantly more control over indexing parameters. If you opt to self-host Milvus Distributed to save on those fees, be prepared to pay the "DevOps tax": managing dependencies like etcd, Pulsar, and MinIO requires a serious engineering effort.
The release of Milvus 2.5 has addressed its biggest historical gap: hybrid search. Previously, you had to hack together BM25 or use third-party tools; now, it includes native sparse-BM25 and Tantivy-based full-text search. This brings it to parity with Weaviate on features, though Weaviate still feels more "application-ready" out of the box. Milvus’s API is strict and typed—you define schemas explicitly, which prevents the "schema drift" chaos common in looser vector stores but slows down initial development.
Skip Milvus if you are a solo developer or a small team. The overhead of the full cluster architecture is overkill, and even the "Lite" python-only version lacks key features like partitions. Use Milvus if you have a platform engineering team and a dataset that threatens to burst single-node databases at the seams.
Pricing
Zilliz Cloud's "free tier" is generous, offering ~2.5 million vCUs/month (enough for ~2-3M vector reads or modest ingestion) and 5GB storage, which stays free indefinitely, not just a trial. The "pay-as-you-go" model is $4 per million vCUs.
The hidden complexity is calculating vCUs: a search isn't just one unit; it scales with data scanned. Scanning a 10M vector collection costs significantly more vCUs per query than a 1M collection unless you use partitions effectively. Unlike Pinecone's straightforward pod hourly rates, Zilliz's consumption-based billing can spike unexpectedly during heavy read/write bursts.
Technical Verdict
Milvus is an engineering powerhouse wrapped in a slightly verbose SDK. Reliability is top-tier for distributed setups, but latency is heavily dependent on your index type (HNSW vs. IVF). Documentation is extensive but often dense. The Python SDK (pymilvus) has improved with a higher-level client, but you'll still write more boilerplate code than with Chroma. Integration with LangChain is stable. Expect a steep learning curve for self-hosted tuning.
Quick Start
# pip install pymilvus
from pymilvus import MilvusClient
client = MilvusClient("./milvus_demo.db") # Runs locally (Milvus Lite)
client.create_collection(collection_name="demo", dimension=768)
res = client.search(
collection_name="demo",
data=[[0.1] * 768],
limit=1
)
print(res)Watch Out
- Milvus Lite does not support partitions or RBAC, so local dev code may break when moving to a production cluster.
- Self-hosting the cluster mode requires managing external dependencies like etcd, Pulsar, and MinIO—it is not a single binary.
- vCU consumption for hybrid search queries is significantly higher than simple vector retrieval, impacting cloud costs.
- Default index parameters often yield high recall but slow performance; tuning
efandnlistis mandatory for production.
