Nomic Embed Text v1.5 is an open-weight embedding model with an 8,192-token context window and a price tag that effectively starts at zero. While OpenAI and Cohere guard their training data like state secrets, Nomic releases everything—weights, code, and the actual dataset—making it the only fully auditable option for enterprises with strict compliance needs.
For most engineering teams, the math is compelling. Running OpenAI's text-embedding-3-small for a knowledge base with 50 million tokens costs about $1.00. Running Nomic v1.5 on a provider like Fireworks AI costs roughly $0.50 ($0.01/1M tokens). If you self-host on existing GPU infrastructure, that marginal cost drops to zero. The real value, however, isn't just the savings—it's the Matryoshka representation learning. This feature allows you to truncate vectors from 768 dimensions down to 256 or even 64 without significant performance loss, potentially cutting your vector database storage costs by 3x to 10x.
Technically, v1.5 is the workhorse. It handles long documents (legal contracts, research papers) without chunking headaches, thanks to that 8k window. It trades blows with OpenAI's small model on the MTEB leaderboard (~62.3 score) and beats older models decisively. However, the newer v2 MoE model is a confusing pivot. While it introduces a Mixture-of-Experts architecture for better multilingual support, it slashes the context window to a suffocating 512 tokens. Do not upgrade blindly if you rely on long-context RAG.
The ecosystem is split between the raw model (fantastic) and the Nomic Atlas platform (expensive). Atlas is a visualization and data management SaaS that charges per seat and per token, which can get pricey quickly. Most developers should treat Nomic as a model to be run elsewhere (Ollama, Fireworks, AWS) rather than buying into the Atlas platform unless they specifically need deep dataset visualization.
Use Nomic if you need a reproducible, long-context embedder that you can run in your own VPC. It's the standard for open-source RAG. If you need the absolute highest retrieval accuracy regardless of cost or closed-source risks, newer models like Qwen or Gemini have edged ahead on the leaderboards, but Nomic remains the transparency king.
Pricing
The 'free tier' is the model itself—you can download the weights and run them on your own hardware via Ollama or Hugging Face for $0. The hosted API cost is effectively commodity pricing: third-party providers like Fireworks AI charge ~$0.01 per 1 million tokens, undercutting OpenAI's $0.02/1M. The confusion comes from Nomic's 'Atlas' platform, which bundles storage and visualization for $10+/month or $125/seat/month. Avoid Atlas if you just want embeddings; use a raw inference provider or self-host to keep costs near zero.
Technical Verdict
A solid, developer-friendly model. The Python SDK is a thin wrapper, but the real power is in the model's compatibility with standard tools like LangChain, LlamaIndex, and SentenceTransformers. Latency is higher than OpenAI's small models due to the architecture, but manageable (20-40ms). Matryoshka support works flawlessly for reducing DB size. The 'fully reproducible' claim holds up—the training data is actually available on Hugging Face.
Quick Start
# pip install nomic
from nomic import embed
output = embed.text(
texts=["Nomic embeds long documents locally."],
model='nomic-embed-text-v1.5',
task_type='search_document'
)
print(output['embeddings'][0][:5]) # Print first 5 dimsWatch Out
- The v2 MoE model hard-caps context at 512 tokens, making it useless for standard chunk-based RAG.
- Nomic's 'Atlas' pricing is for the visualization platform, not just API inference—don't confuse the two.
- Self-hosting v1.5 requires decent GPU VRAM; it's slower than MiniLM on CPU.
- You must specify
task_type(e.g., 'search_query' vs 'search_document') for optimal performance.
