Skip to content

Choosing a Provider

This guide helps you select the right embedding provider for your use case.

Quick Decision Tree

Need image embeddings?
  └─ Yes → CLIP
  └─ No ↓

Need sparse vectors for hybrid search?
  └─ Yes → SPLADE
  └─ No ↓

Need token-level matching?
  └─ Yes → ColBERT
  └─ No ↓

Can use cloud API?
  └─ No → Local (Ollama, Local, or FastEmbed)
  └─ Yes ↓

Need EU data residency?
  └─ Yes → Mistral or Azure (EU region)
  └─ No ↓

Need domain-specific models?
  └─ Yes → Voyage (code, law, finance) or Bedrock (Cohere)
  └─ No ↓

Need enterprise compliance?
  └─ Yes → Azure, Bedrock, or Vertex
  └─ No ↓

Need best retrieval quality?
  └─ Yes → Voyage or Cohere
  └─ No → OpenAI (general purpose)

Local only?
  └─ Have Ollama → Ollama (recommended)
  └─ Need lightweight → FastEmbed (~100MB)
  └─ Need any HF model → Local (~2GB)

Provider Comparison

Cloud Providers

Provider Quality Best For Dimensions EU Residency
OpenAI ★★★★★ General purpose 256-3072 No
Cohere ★★★★★ Input type optimization 384-1024 No
Voyage ★★★★★ Retrieval, domain-specific 512-1536 No
Jina ★★★★☆ Long context (8K), multilingual 768-1024 No
Mistral ★★★★☆ EU data residency 1024 Yes
Azure ★★★★★ Enterprise compliance 1536-3072 Yes (regional)
Bedrock ★★★★★ AWS ecosystem 1024-1536 Yes (regional)
Vertex ★★★★☆ GCP ecosystem 768 Yes (regional)

Local Providers

Provider Quality Speed Install Size Dependencies Offline
Ollama ★★★★☆ Fast ~2GB (model) Ollama server Yes
Local ★★★★☆ Medium ~2GB Python, PyTorch Yes
FastEmbed ★★★★☆ Fast ~100MB Python, ONNX Yes

Specialized Providers

Provider Type Output Best For
SPLADE Sparse {indices, values} Hybrid search, keyword expansion
ColBERT Multi-vector [[float]] per token Fine-grained semantic matching
CLIP Cross-modal [float] Image-text search

When to Use Each Provider

OpenAI

Best for: Production systems requiring highest quality

{openai, #{model => <<"text-embedding-3-small">>}}

Use when:

  • Quality is the top priority
  • You have budget for API costs
  • Low latency to OpenAI servers
  • Don't need offline capability

Avoid when:

  • Data privacy is critical (data sent to API)
  • Need offline/air-gapped operation
  • High volume with tight budget

Cohere

Best for: Production with input type optimization

{cohere, #{
    model => <<"embed-english-v3.0">>,
    input_type => <<"search_document">>  % or search_query
}}

Use when:

  • Need separate document vs query embeddings
  • Want classification/clustering optimization
  • Building production search systems
  • Need multilingual support

Avoid when:

  • Don't need input type distinction
  • Tight budget (comparable to OpenAI pricing)

Voyage AI

Best for: Best-in-class retrieval, domain-specific models

{voyage, #{model => <<"voyage-3">>}}
% Or domain-specific:
{voyage, #{model => <<"voyage-code-3">>}}    % code search
{voyage, #{model => <<"voyage-law-2">>}}     % legal
{voyage, #{model => <<"voyage-finance-2">>}} % financial

Use when:

  • Building RAG systems (top MTEB scores)
  • Need domain-specific models (code, law, finance)
  • Retrieval quality is critical

Avoid when:

  • Budget constrained
  • Don't need specialized retrieval

Jina AI

Best for: Long context, multilingual

{jina, #{model => <<"jina-embeddings-v3">>}}

Use when:

  • Processing long documents (8K context)
  • Need multilingual with free tier
  • Budget conscious (free 1M tokens/month)

Avoid when:

  • Need highest retrieval quality
  • Processing only short texts

Mistral

Best for: EU data residency

{mistral, #{model => <<"mistral-embed">>}}

Use when:

  • EU data residency required
  • GDPR compliance important
  • Already using Mistral for LLMs

Avoid when:

  • Don't need EU residency
  • Need domain-specific models

Azure OpenAI

Best for: Enterprise with Azure ecosystem

{azure, #{
    endpoint => <<"https://your-resource.cognitiveservices.azure.com">>,
    deployment => <<"text-embedding-3-small">>
}}

Use when:

  • Need enterprise compliance (SOC 2, HIPAA)
  • Already in Azure ecosystem
  • Need VNet integration
  • Regional data residency required

Avoid when:

  • Don't need enterprise features
  • Simpler setup preferred

AWS Bedrock

Best for: AWS ecosystem integration

{bedrock, #{
    model => <<"amazon.titan-embed-text-v2:0">>,
    region => <<"us-east-1">>
}}

Use when:

  • Already in AWS ecosystem
  • Need IAM/VPC integration
  • Want choice of models (Titan, Cohere)
  • Need enterprise compliance

Avoid when:

  • Don't use AWS
  • Need batch API (not supported)

Google Vertex AI

Best for: GCP ecosystem integration

{vertex, #{
    project => <<"my-project">>,
    model => <<"text-embedding-004">>
}}

Use when:

  • Already in GCP ecosystem
  • Need BigQuery integration
  • Need VPC-SC, CMEK

Avoid when:

  • Don't use GCP
  • Access token refresh is problematic

Ollama

Best for: Local deployment without Python complexity

{ollama, #{url => <<"http://localhost:11434">>, model => <<"nomic-embed-text">>}}

Use when:

  • Want local inference without Python
  • Already using Ollama for LLMs
  • Need good quality with simple setup
  • Want to avoid API costs

Avoid when:

  • Can't install Ollama
  • Need embedded solution (no server)
  • Memory constrained (models loaded in RAM)

Local (sentence-transformers)

Best for: Full control, access to any HuggingFace model

{local, #{model => "BAAI/bge-base-en-v1.5"}}

Use when:

  • Need specific HuggingFace models
  • Already have PyTorch environment
  • Want maximum model flexibility
  • Need fine-tuning capability

Avoid when:

  • Disk space is limited (~2GB install)
  • PyTorch dependency is problematic
  • Need fastest possible inference

FastEmbed

Best for: Lightweight local inference

{fastembed, #{model => "BAAI/bge-small-en-v1.5"}}

Use when:

  • Need local inference with small footprint
  • Don't want PyTorch dependency
  • Deploying to resource-constrained environments
  • Quality similar to sentence-transformers is acceptable

Avoid when:

  • Need models not supported by FastEmbed
  • Need absolute maximum quality

SPLADE

Best for: Hybrid lexical-semantic search

{splade, #{model => "prithivida/Splade_PP_en_v1"}}

Use when:

  • Building hybrid search (BM25 + semantic)
  • Need keyword expansion (synonyms, related terms)
  • Want efficient inverted index storage
  • Combining with dense embeddings

Avoid when:

  • Only need dense vector search
  • Memory constrained (sparse→dense is expensive)
  • Don't have hybrid search infrastructure

Example use case: E-commerce search where users type product names (lexical) but also want semantic matches.


ColBERT

Best for: Fine-grained passage retrieval

{colbert, #{model => "colbert-ir/colbertv2.0"}}

Use when:

  • Single-vector similarity isn't precise enough
  • Building QA or passage retrieval systems
  • Need token-level relevance signals
  • Documents have varying relevant sections

Avoid when:

  • Storage is limited (multiple vectors per doc)
  • Simple semantic similarity is sufficient
  • Real-time latency is critical

Example use case: Legal document search where specific clauses matter more than overall document similarity.


CLIP

Best for: Image and cross-modal search

{clip, #{model => "openai/clip-vit-base-patch32"}}

Use when:

  • Searching images with text queries
  • Finding visually similar images
  • Building multi-modal applications
  • Need zero-shot image classification

Avoid when:

  • Only working with text
  • Don't need image capabilities

Example use case: Stock photo search, content moderation, visual product search.

Combining Providers

Provider Chain (Fallback)

Use multiple providers for high availability:

#{embedder => [
    {ollama, #{url => <<"http://localhost:11434">>}},
    {openai, #{}},
    {local, #{}}
]}

If Ollama fails → try OpenAI → fall back to local Python.

Hybrid Search (SPLADE + Dense)

Combine sparse and dense for best retrieval:

%% Sparse for lexical matching
{ok, SpladeState} = barrel_embed:init(#{embedder => {splade, #{}}}).

%% Dense for semantic matching
{ok, DenseState} = barrel_embed:init(#{embedder => {ollama, #{...}}}).

%% Query both and combine scores
SparseScore = search_sparse(Query, SpladeState),
DenseScore = search_dense(Query, DenseState),
FinalScore = 0.3 * SparseScore + 0.7 * DenseScore.

Performance Benchmarks

Approximate performance on typical hardware (results vary):

Provider First Request Subsequent Batch (100 texts)
OpenAI 200ms 100ms 500ms
Ollama 2s (model load) 50ms 2s
Local 5s (model load) 100ms 3s
FastEmbed 3s (model load) 50ms 2s

Summary Table

Use Case Recommended Provider
Production, general purpose OpenAI
Best retrieval quality Voyage AI
Domain-specific (code/law/finance) Voyage AI
Long context (8K tokens) Jina AI
EU data residency Mistral
Enterprise + Azure Azure OpenAI
Enterprise + AWS AWS Bedrock
Enterprise + GCP Google Vertex AI
Input type optimization Cohere
Local, simple setup Ollama
Local, any HF model Local
Local, lightweight FastEmbed
Hybrid search SPLADE + Dense
Passage retrieval, QA ColBERT
Image search CLIP
High availability Provider chain