Embedding Providers

Embedding providers convert text into vector representations that enable semantic search. These vectors capture the meaning of text, allowing Cognee to find conceptually related content even when the wording is different.

New to configuration?See the Setup Configuration Overview for the complete workflow:install extras → create .env → choose providers → handle pruning.

Supported Providers

Cognee supports multiple embedding providers:

OpenAI — Text embedding models via OpenAI API (default)
Azure OpenAI — Text embedding models via Azure OpenAI Service
Google Gemini — Embedding models via Google AI
Mistral — Embedding models via Mistral AI
AWS Bedrock — Embedding models via AWS Bedrock
Ollama — Local embedding models via Ollama
LM Studio — Local embedding models via LM Studio
Fastembed — CPU-friendly local embeddings
vLLM — Self-hosted embedding models via vLLM
Custom — OpenAI-compatible embedding endpoints

LLM/Embedding Configuration: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both LLM and embeddings to avoid unexpected defaults.

Configuration

Environment Variables

Set these environment variables in your .env file:

EMBEDDING_PROVIDER — The provider to use (openai, gemini, mistral, ollama, fastembed, custom)
EMBEDDING_MODEL — The specific embedding model to use
EMBEDDING_DIMENSIONS — The vector dimension size (must match your vector store)
EMBEDDING_API_KEY — Your API key (falls back to LLM_API_KEY if not set)
EMBEDDING_ENDPOINT — Custom endpoint URL (for Azure, Ollama, or custom providers)
EMBEDDING_API_VERSION — API version (for Azure OpenAI)
EMBEDDING_MAX_TOKENS — Maximum tokens per request (optional)
HUGGINGFACE_TOKENIZER — HuggingFace Hub model ID used for token counting when EMBEDDING_PROVIDER is ollama

Provider Setup Guides

OpenAI (Default)

OpenAI provides high-quality embeddings with good performance.

EMBEDDING_PROVIDER="openai"
EMBEDDING_MODEL="openai/text-embedding-3-large"
EMBEDDING_DIMENSIONS="3072"
# Optional
# EMBEDDING_API_KEY=sk-...   # falls back to LLM_API_KEY if omitted
# EMBEDDING_ENDPOINT=https://api.openai.com/v1
# EMBEDDING_API_VERSION=
# EMBEDDING_MAX_TOKENS=8191

Azure OpenAI Embeddings

Use Azure OpenAI Service for embeddings with your own deployment.

EMBEDDING_PROVIDER="openai"
EMBEDDING_MODEL="azure/text-embedding-3-large"
EMBEDDING_ENDPOINT="https://<your-az>.cognitiveservices.azure.com/openai/deployments/text-embedding-3-large"
EMBEDDING_API_KEY="az-..."
EMBEDDING_API_VERSION="2023-05-15"
EMBEDDING_DIMENSIONS="3072"

Google Gemini

Use Google’s embedding models for semantic search.

EMBEDDING_PROVIDER="gemini"
EMBEDDING_MODEL="gemini/gemini-embedding-001"
EMBEDDING_API_KEY="AIza..."
EMBEDDING_DIMENSIONS="768"

Mistral

Use Mistral’s embedding models for high-quality vector representations.

EMBEDDING_PROVIDER="mistral"
EMBEDDING_MODEL="mistral/mistral-embed"
EMBEDDING_API_KEY="sk-mis-..."
EMBEDDING_DIMENSIONS="1024"

Installation: Install the required dependency:

pip install mistral-common[sentencepiece]

AWS Bedrock

Use embedding models provided by the AWS Bedrock service.

EMBEDDING_PROVIDER="bedrock"
EMBEDDING_MODEL="<your_model_name>"
EMBEDDING_DIMENSIONS="<dimensions_of_the_model>"
EMBEDDING_API_KEY="<your_api_key>"
EMBEDDING_MAX_TOKENS="<max_tokens_of_your_model>"

Ollama (Local)

Run embedding models locally with Ollama for privacy and cost control.

EMBEDDING_PROVIDER="ollama"
EMBEDDING_MODEL="nomic-embed-text:latest"
EMBEDDING_ENDPOINT="http://localhost:11434/api/embed"
EMBEDDING_DIMENSIONS="768"
HUGGINGFACE_TOKENIZER="nomic-ai/nomic-embed-text-v1.5"

HUGGINGFACE_TOKENIZER is the HuggingFace repo ID of the tokenizer used for token-length counting when sending requests to the Ollama embedding endpoint.Installation: Install Ollama from ollama.ai and pull your desired embedding model:

ollama pull nomic-embed-text:latest

Zero-API-key setup: To run fully offline with no OpenAI key, you must configure both the LLM provider and the embedding provider to use local backends. See the Local Setup (No API Key) section for a complete combined .env example.

LM Studio (Local)

Run embedding models locally with LM Studio for privacy and cost control.

EMBEDDING_PROVIDER="custom"
EMBEDDING_MODEL="lm_studio/text-embedding-nomic-embed-text-1.5"
EMBEDDING_ENDPOINT="http://127.0.0.1:1234/v1"
EMBEDDING_API_KEY="."
EMBEDDING_DIMENSIONS="768"

Installation: Install LM Studio from lmstudio.ai and download your desired model from LM Studio’s interface. Load your model, start the LM Studio server, and Cognee will be able to connect to it.

Fastembed (Local)

Use Fastembed for CPU-friendly local embeddings without GPU requirements.

EMBEDDING_PROVIDER="fastembed"
EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
EMBEDDING_DIMENSIONS="384"

Installation: Fastembed is included by default with Cognee.Known Issues:

As of September 2025, Fastembed requires Python < 3.13 (not compatible with Python 3.13+)

vLLM

Use vLLM to serve local or self-hosted embedding models with an OpenAI-compatible API.Example with Qwen3-Embedding-4B on port 8001:

EMBEDDING_PROVIDER="custom"
EMBEDDING_MODEL="hosted_vllm/Qwen/Qwen3-Embedding-4B"
EMBEDDING_ENDPOINT="http://localhost:8001/v1"
EMBEDDING_API_KEY="."
EMBEDDING_DIMENSIONS="2560"

hosted_vllm/ prefix required: Include hosted_vllm/ at the start of the model name so LiteLLM routes requests to your vLLM server. The model name after the prefix should match the model ID returned by your vLLM server’s /v1/models endpoint.

Tokenization: Cognee automatically strips the hosted_vllm/ prefix when loading the HuggingFace tokenizer, so no separate HUGGINGFACE_TOKENIZER setting is needed as long as the model name after the prefix is a valid HuggingFace model ID.To verify the model name your vLLM server exposes, run:

curl http://localhost:8001/v1/models

See the LiteLLM vLLM documentation for more details.

Custom Providers

Use OpenAI-compatible embedding endpoints from other providers.

EMBEDDING_PROVIDER="custom"
EMBEDDING_MODEL="provider/your-embedding-model"
EMBEDDING_ENDPOINT="https://your-endpoint.example.com/v1"
EMBEDDING_API_KEY="provider-..."
EMBEDDING_DIMENSIONS="<match-your-model>"

Advanced Options

Rate Limiting

EMBEDDING_RATE_LIMIT_ENABLED="true"
EMBEDDING_RATE_LIMIT_REQUESTS="10"
EMBEDDING_RATE_LIMIT_INTERVAL="5"

Testing and Development

# Mock embeddings for testing (returns zero vectors)
MOCK_EMBEDDING="true"

Important Notes

Dimension Consistency: EMBEDDING_DIMENSIONS must match your vector store collection schema
API Key Fallback: If EMBEDDING_API_KEY is not set, Cognee uses LLM_API_KEY (except for custom providers)
Tokenization: For Ollama and Hugging Face models, set HUGGINGFACE_TOKENIZER for proper token counting
Performance: Local providers (Ollama, Fastembed) are slower but offer privacy and cost benefits

LLM Providers

Configure LLM providers for text generation

Vector Stores

Set up vector databases for embedding storage

Overview

Return to setup configuration overview

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

OSS

Supported Providers

Configuration

Provider Setup Guides

Advanced Options

Important Notes

LLM Providers

Vector Stores

Overview

Getting Started

Core Concepts

Setup Configuration

Guides

Examples

CLI

OSS

​Supported Providers

​Configuration

​Provider Setup Guides

​Advanced Options

​Important Notes

LLM Providers

Vector Stores

Overview

Supported Providers

Configuration

Provider Setup Guides

Advanced Options

Important Notes