.env file.
This section provides beginner-friendly guides for setting up different backends, with detailed technical information available in expandable sections.
What You Can Configure
Cognee uses a flexible architecture that lets you choose the best tools for your needs. We recommend starting with the defaults to get familiar with Cognee, then customizing each component as needed:- LLM Providers — Choose from OpenAI, Azure OpenAI, Google Gemini, Anthropic, Ollama, or custom providers (like vLLM) for text generation and reasoning tasks
- Structured Output Backends — Configure LiteLLM + Instructor or BAML for reliable data extraction from LLM responses
- Embedding Providers — Select from OpenAI, Azure OpenAI, Google Gemini, Mistral, Ollama, Fastembed, or custom embedding services to create vector representations for semantic search
- Relational Databases — Use SQLite for local development or Postgres for production to store metadata, documents, and system state
- Vector Stores — Store embeddings in LanceDB, PGVector, Qdrant, Redis, ChromaDB, FalkorDB, or Neptune Analytics for similarity search
- Graph Stores — Build knowledge graphs with Kuzu, Kuzu-remote, Neo4j, Neptune, Neptune Analytics, or Memgraph to manage relationships and reasoning
- Dataset Separation & Access Control — Configure dataset-level permissions and isolation
- Sessions & Caching — Enable conversational memory with Redis or filesystem cache adapters
Want to run Cognee without a cloud API key? See the Local Setup guide for step-by-step instructions using Ollama and Fastembed.
Environment Variable Quick Reference
The tables below list the most commonly used configuration variables. For full details on each group, follow the links to the dedicated guides.Only a small number of internal variables use the
COGNEE_ prefix: COGNEE_LOGS_DIR, COGNEE_TRACING_ENABLED, COGNEE_CLOUD_API_URL, and COGNEE_CLOUD_AUTH_TOKEN. All other configuration keys (LLM, embedding, database, etc.) are used without any prefix.LLM
LLM
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER | openai | Provider: openai, azure, gemini, anthropic, ollama, mistral, bedrock, custom |
LLM_MODEL | openai/gpt-4o-mini | Model in provider/model-name format |
LLM_API_KEY | — | API key for the LLM provider |
LLM_ENDPOINT | — | Custom endpoint URL (required for Ollama, vLLM, etc.) |
LLM_API_VERSION | — | API version (required for Azure) |
LLM_TEMPERATURE | 0.0 | Response temperature (0.0–2.0) |
Embeddings
Embeddings
| Variable | Default | Description |
|---|---|---|
EMBEDDING_PROVIDER | openai | Provider: openai, ollama, fastembed, gemini, mistral, bedrock, custom |
EMBEDDING_MODEL | openai/text-embedding-3-large | Model in provider/model-name format |
EMBEDDING_DIMENSIONS | 3072 | Vector dimension size (must match your vector store) |
EMBEDDING_API_KEY | — | API key (falls back to LLM_API_KEY if unset) |
EMBEDDING_ENDPOINT | — | Custom endpoint URL (required for Ollama, etc.) |
HUGGINGFACE_TOKENIZER | — | HuggingFace Hub model ID for token counting with Ollama (e.g. nomic-ai/nomic-embed-text-v1.5) |
Databases
Databases
| Variable | Default | Description |
|---|---|---|
DB_PROVIDER | sqlite | Relational DB: sqlite, postgres |
DB_HOST / DB_PORT / DB_USERNAME / DB_PASSWORD | — | Postgres connection details |
VECTOR_DB_PROVIDER | lancedb | Vector store: lancedb, pgvector, qdrant, chromadb, weaviate, milvus |
VECTOR_DB_URL | — | Vector store connection URL |
GRAPH_DATABASE_PROVIDER | kuzu | Graph store: kuzu, kuzu-remote, neo4j, neptune |
GRAPH_DATABASE_URL | — | Graph store connection URL |
GRAPH_DATABASE_USERNAME / GRAPH_DATABASE_PASSWORD | — | Graph store credentials |
Storage & Logging
Storage & Logging
| Variable | Default | Description |
|---|---|---|
STORAGE_BACKEND | local | Storage backend: local, s3 |
DATA_ROOT_DIRECTORY | .data_storage | Root directory for data files |
SYSTEM_ROOT_DIRECTORY | .cognee_system | Root directory for system files |
COGNEE_LOGS_DIR | {package}/logs | Override the logs directory path |
LOG_LEVEL | INFO | Logging level: DEBUG, INFO, WARNING, ERROR |
TELEMETRY_DISABLED | false | Set true to disable anonymous telemetry |
Debug Mode
Debug Mode
To enable verbose logging in a self-hosted Cognee instance, set Verbose logging covers pipeline execution, LLM calls, database queries, and graph operations—useful when troubleshooting data processing or provider configuration.
LOG_LEVEL in your .env:Docker Environment Variables
Use the same variable names as in your.env; pass them with docker run -e or load them from a file with --env-file.
Examples
Examples
Observability & Telemetry
Cognee includes built-in telemetry to help you monitor and debug your knowledge graph operations. You can control telemetry behavior with environment variables:TELEMETRY_DISABLED(boolean, optional): Set totrueto disable all telemetry collection (default:false)
- Search query performance metrics
- Processing pipeline execution times
- Error rates and debugging information
- System resource usage
Telemetry data helps improve Cognee’s performance and reliability. It’s collected anonymously and doesn’t include your actual data content.
Configuration Workflow
- Install Cognee with all optional dependencies:
- Local setup:
uv sync --all-extras - Library:
pip install "cognee[all]"
- Local setup:
- Create a
.envfile in your project root (if you haven’t already) — see Installation for details - Choose your preferred providers and follow the configuration instructions from the guides below
LLM Providers
Configure OpenAI, Azure, Gemini, Anthropic, Ollama, or custom LLM providers (like vLLM)
Structured Output Backends
Configure LiteLLM + Instructor or BAML for reliable data extraction
Embedding Providers
Set up OpenAI, Mistral, Ollama, Fastembed, or custom embedding services
Relational Databases
Choose between SQLite for local development or Postgres for production
Vector Stores
Configure LanceDB, PGVector, Qdrant, Redis, ChromaDB, FalkorDB, or Neptune Analytics
Graph Stores
Set up Kuzu, Neo4j, or Neptune for knowledge graph storage