OpenAI’s text-embedding-3-small costs $0.02 per one million tokens. To put that in perspective, you can embed the entire King James Bible for about four cents. At this price point, embedding text is effectively free; the real cost has shifted entirely to your vector database storage and retrieval operations.
For 95% of developers, this API is the default choice, and rightfully so. It is the "tap water" of the industry—cheap, readily available, and safe to drink, even if it’s not the finest vintage available. The service handles massive throughput with high reliability, and the introduction of "Matryoshka" representation learning is a significant infrastructure win. This feature allows you to slice the standard 1,536-dimensional vectors down to 512 or even 256 dimensions while retaining most of the semantic performance. For a RAG application storing 10 million vectors, shrinking dimensions by 3x cuts your Pinecone or Qdrant storage bill significantly, often saving more money than the embedding API cost itself.
However, it is not the best model on the market. Benchmarks like MTEB show that specialized providers like Voyage AI and open-weights models like BGE-M3 often outperform OpenAI on retrieval accuracy, particularly in niche domains like law, finance, or multilingual retrieval. OpenAI’s embeddings are "general purpose" in the strictest sense—competent at everything, master of nothing. Additionally, the lack of a native fine-tuning API means you cannot teach the model your specific internal jargon without building complex adapter layers yourself.
If you are building a standard RAG pipeline, start here. The integration is universal, the reliability is high, and the cost is negligible. But if your product differentiates on search quality—for example, a legal discovery tool or a patent search engine—you should pay the premium for Voyage AI or self-host a fine-tuned model. For everyone else, text-embedding-3-small is the set-it-and-forget-it infrastructure block.
Pricing
The pricing is a race to the bottom. At $0.02/1M tokens for text-embedding-3-small, a startup processing 10,000 documents (1k tokens each) daily pays roughly $0.20/day—$6/month. Even the "expensive" text-embedding-3-large ($0.13/1M) is cheaper than the legacy ada-002 ($0.10/1M).
The hidden efficiency is the Batch API, which offers a 50% discount ($0.01/1M for small) for jobs that can wait up to 24 hours—perfect for initial backfills. The real cost cliff isn't the API; it's the vector database storage. Storing high-dimensional vectors (3072 dims for 3-large) will bloat your vector DB bill 2-4x compared to smaller alternatives. Use Matryoshka truncation to manage this.
Technical Verdict
The API is boringly reliable, which is exactly what you want. The Python SDK is the industry reference implementation. Latency is consistently low (sub-100ms for reasonable chunks), though strict p99 guarantees are better with dedicated providers like Cohere. The lack of visibility into the training data is the main technical trade-off; you have no idea why a retrieval failed, and you can't fine-tune the model to fix it. It's a pure black box.
Quick Start
# pip install openai
from openai import OpenAI
client = OpenAI(api_key="sk-...")
resp = client.embeddings.create(
input="The quick brown fox jumps over the lazy dog",
model="text-embedding-3-small"
)
print(resp.data[0].embedding[:5])Watch Out
- Matryoshka Truncation: If you shorten vectors (e.g., to 256 dims), you may need to manually normalize them again before insertion if your DB relies on cosine similarity.
- Legacy Lock-in: text-embedding-ada-002 is now 5x more expensive than 3-small; migrate unless you can't afford a re-index.
- Zero Privacy: Your data is sent to OpenAI's servers. Not suitable for strictly air-gapped or sensitive PII/compliance workloads without enterprise agreements.
- Rate Limits: While high, hitting the TPM (tokens per minute) limit is easy during initial backfills. Use the Batch API to avoid 429 errors.
