New to configuration?See the Setup Configuration Overview for the complete workflow:install extras → create
.env → choose providers → handle pruning.Supported Providers
Cognee supports multiple embedding providers:- OpenAI — Text embedding models via OpenAI API (default)
- Azure OpenAI — Text embedding models via Azure OpenAI Service
- Google Gemini — Embedding models via Google AI
- Mistral — Embedding models via Mistral AI
- AWS Bedrock — Embedding models via AWS Bedrock
- Ollama — Local embedding models via Ollama
- LM Studio — Local embedding models via LM Studio
- Fastembed — CPU-friendly local embeddings
- vLLM — Self-hosted embedding models via vLLM
- Custom — OpenAI-compatible embedding endpoints
Configuration
Environment Variables
Environment Variables
Set these environment variables in your
.env file:EMBEDDING_PROVIDER— The provider to use (openai, gemini, mistral, ollama, fastembed, custom)EMBEDDING_MODEL— The specific embedding model to useEMBEDDING_DIMENSIONS— The vector dimension size (must match your vector store)EMBEDDING_API_KEY— Your API key (falls back toLLM_API_KEYif not set)EMBEDDING_ENDPOINT— Custom endpoint URL (for Azure, Ollama, or custom providers)EMBEDDING_API_VERSION— API version (for Azure OpenAI)EMBEDDING_MAX_TOKENS— Maximum tokens per request (optional)HUGGINGFACE_TOKENIZER— HuggingFace Hub model ID used for token counting whenEMBEDDING_PROVIDERisollama
Provider Setup Guides
OpenAI (Default)
OpenAI (Default)
OpenAI provides high-quality embeddings with good performance.
Azure OpenAI Embeddings
Azure OpenAI Embeddings
Use Azure OpenAI Service for embeddings with your own deployment.
Google Gemini
Google Gemini
Use Google’s embedding models for semantic search.
Mistral
Mistral
Use Mistral’s embedding models for high-quality vector representations.Installation: Install the required dependency:
AWS Bedrock
AWS Bedrock
Use embedding models provided by the AWS Bedrock service.
Ollama (Local)
Ollama (Local)
Run embedding models locally with Ollama for privacy and cost control.
HUGGINGFACE_TOKENIZER is the HuggingFace repo ID of the tokenizer used for token-length counting when sending requests to the Ollama embedding endpoint.Installation: Install Ollama from ollama.ai and pull your desired embedding model:Zero-API-key setup: To run fully offline with no OpenAI key, you must configure both the LLM provider and the embedding provider to use local backends. See the Local Setup (No API Key) section for a complete combined
.env example.LM Studio (Local)
LM Studio (Local)
Run embedding models locally with LM Studio for privacy and cost control.Installation: Install LM Studio from lmstudio.ai and download your desired model from
LM Studio’s interface.
Load your model, start the LM Studio server, and Cognee will be able to connect to it.
Fastembed (Local)
Fastembed (Local)
Use Fastembed for CPU-friendly local embeddings without GPU requirements.Installation: Fastembed is included by default with Cognee.Known Issues:
- As of September 2025, Fastembed requires Python < 3.13 (not compatible with Python 3.13+)
vLLM
vLLM
Use vLLM to serve local or self-hosted embedding models with an OpenAI-compatible API.Example with Qwen3-Embedding-4B on port 8001:Tokenization: Cognee automatically strips the See the LiteLLM vLLM documentation for more details.
hosted_vllm/ prefix when loading the HuggingFace tokenizer, so no separate HUGGINGFACE_TOKENIZER setting is needed as long as the model name after the prefix is a valid HuggingFace model ID.To verify the model name your vLLM server exposes, run:Custom Providers
Custom Providers
Use OpenAI-compatible embedding endpoints from other providers.
Advanced Options
Rate Limiting
Rate Limiting
Testing and Development
Testing and Development
Important Notes
- Dimension Consistency:
EMBEDDING_DIMENSIONSmust match your vector store collection schema - API Key Fallback: If
EMBEDDING_API_KEYis not set, Cognee usesLLM_API_KEY(except for custom providers) - Tokenization: For Ollama and Hugging Face models, set
HUGGINGFACE_TOKENIZERfor proper token counting - Performance: Local providers (Ollama, Fastembed) are slower but offer privacy and cost benefits
LLM Providers
Configure LLM providers for text generation
Vector Stores
Set up vector databases for embedding storage
Overview
Return to setup configuration overview