Skip to main content
Configure Cognee to use your preferred LLM, embedding engine, relational database, vector store, and graph store via environment variables in a local .env file. This section provides beginner-friendly guides for setting up different backends, with detailed technical information available in expandable sections.

What You Can Configure

Cognee uses a flexible architecture that lets you choose the best tools for your needs. We recommend starting with the defaults to get familiar with Cognee, then customizing each component as needed:
  • LLM Providers — Choose from OpenAI, Azure OpenAI, Google Gemini, Anthropic, Ollama, or custom providers (like vLLM) for text generation and reasoning tasks
  • Structured Output Backends — Configure LiteLLM + Instructor or BAML for reliable data extraction from LLM responses
  • Embedding Providers — Select from OpenAI, Azure OpenAI, Google Gemini, Mistral, Ollama, Fastembed, or custom embedding services to create vector representations for semantic search
  • Relational Databases — Use SQLite for local development or Postgres for production to store metadata, documents, and system state
  • Vector Stores — Store embeddings in LanceDB, PGVector, Qdrant, Redis, ChromaDB, FalkorDB, or Neptune Analytics for similarity search
  • Graph Stores — Build knowledge graphs with Kuzu, Kuzu-remote, Neo4j, Neptune, Neptune Analytics, or Memgraph to manage relationships and reasoning
  • Dataset Separation & Access Control — Configure dataset-level permissions and isolation
  • Sessions & Caching — Enable conversational memory with Redis or filesystem cache adapters
Want to run Cognee without a cloud API key? See the Local Setup guide for step-by-step instructions using Ollama and Fastembed.

Environment Variable Quick Reference

The tables below list the most commonly used configuration variables. For full details on each group, follow the links to the dedicated guides.
Only a small number of internal variables use the COGNEE_ prefix: COGNEE_LOGS_DIR, COGNEE_TRACING_ENABLED, COGNEE_CLOUD_API_URL, and COGNEE_CLOUD_AUTH_TOKEN. All other configuration keys (LLM, embedding, database, etc.) are used without any prefix.
VariableDefaultDescription
LLM_PROVIDERopenaiProvider: openai, azure, gemini, anthropic, ollama, mistral, bedrock, custom
LLM_MODELopenai/gpt-4o-miniModel in provider/model-name format
LLM_API_KEYAPI key for the LLM provider
LLM_ENDPOINTCustom endpoint URL (required for Ollama, vLLM, etc.)
LLM_API_VERSIONAPI version (required for Azure)
LLM_TEMPERATURE0.0Response temperature (0.0–2.0)
VariableDefaultDescription
EMBEDDING_PROVIDERopenaiProvider: openai, ollama, fastembed, gemini, mistral, bedrock, custom
EMBEDDING_MODELopenai/text-embedding-3-largeModel in provider/model-name format
EMBEDDING_DIMENSIONS3072Vector dimension size (must match your vector store)
EMBEDDING_API_KEYAPI key (falls back to LLM_API_KEY if unset)
EMBEDDING_ENDPOINTCustom endpoint URL (required for Ollama, etc.)
HUGGINGFACE_TOKENIZERHuggingFace Hub model ID for token counting with Ollama (e.g. nomic-ai/nomic-embed-text-v1.5)
VariableDefaultDescription
DB_PROVIDERsqliteRelational DB: sqlite, postgres
DB_HOST / DB_PORT / DB_USERNAME / DB_PASSWORDPostgres connection details
VECTOR_DB_PROVIDERlancedbVector store: lancedb, pgvector, qdrant, chromadb, weaviate, milvus
VECTOR_DB_URLVector store connection URL
GRAPH_DATABASE_PROVIDERkuzuGraph store: kuzu, kuzu-remote, neo4j, neptune
GRAPH_DATABASE_URLGraph store connection URL
GRAPH_DATABASE_USERNAME / GRAPH_DATABASE_PASSWORDGraph store credentials
VariableDefaultDescription
STORAGE_BACKENDlocalStorage backend: local, s3
DATA_ROOT_DIRECTORY.data_storageRoot directory for data files
SYSTEM_ROOT_DIRECTORY.cognee_systemRoot directory for system files
COGNEE_LOGS_DIR{package}/logsOverride the logs directory path
LOG_LEVELINFOLogging level: DEBUG, INFO, WARNING, ERROR
TELEMETRY_DISABLEDfalseSet true to disable anonymous telemetry
To enable verbose logging in a self-hosted Cognee instance, set LOG_LEVEL in your .env:
LOG_LEVEL="DEBUG"
Verbose logging covers pipeline execution, LLM calls, database queries, and graph operations—useful when troubleshooting data processing or provider configuration.

Docker Environment Variables

Use the same variable names as in your .env; pass them with docker run -e or load them from a file with --env-file.
docker run \
  -e LLM_PROVIDER=ollama \
  -e LLM_MODEL=ollama/llama3.2 \
  -e LLM_ENDPOINT=http://host.docker.internal:11434 \
  -e EMBEDDING_PROVIDER=ollama \
  -e EMBEDDING_MODEL=nomic-embed-text:latest \
  -e EMBEDDING_ENDPOINT=http://host.docker.internal:11434/api/embed \
  -e EMBEDDING_DIMENSIONS=768 \
  -e HUGGINGFACE_TOKENIZER=nomic-ai/nomic-embed-text-v1.5 \
  cognee/cognee:main
Or using an env file:
docker run --env-file .env cognee/cognee:main

Observability & Telemetry

Cognee includes built-in telemetry to help you monitor and debug your knowledge graph operations. You can control telemetry behavior with environment variables:
  • TELEMETRY_DISABLED (boolean, optional): Set to true to disable all telemetry collection (default: false)
When telemetry is enabled, Cognee automatically collects:
  • Search query performance metrics
  • Processing pipeline execution times
  • Error rates and debugging information
  • System resource usage
Telemetry data helps improve Cognee’s performance and reliability. It’s collected anonymously and doesn’t include your actual data content.

Configuration Workflow

  1. Install Cognee with all optional dependencies:
    • Local setup: uv sync --all-extras
    • Library: pip install "cognee[all]"
  2. Create a .env file in your project root (if you haven’t already) — see Installation for details
  3. Choose your preferred providers and follow the configuration instructions from the guides below
Configuration Changes: If you’ve already run Cognee with default settings and are now changing your configuration (e.g., switching from SQLite to Postgres, or changing vector stores), you should call pruning operations before the next cognification to ensure data consistency.
LLM/Embedding Configuration: If you configure only LLM or only embeddings, the other defaults to OpenAI. Ensure you have a working OpenAI API key, or configure both LLM and embeddings to avoid unexpected defaults.

LLM Providers

Configure OpenAI, Azure, Gemini, Anthropic, Ollama, or custom LLM providers (like vLLM)

Structured Output Backends

Configure LiteLLM + Instructor or BAML for reliable data extraction

Embedding Providers

Set up OpenAI, Mistral, Ollama, Fastembed, or custom embedding services

Relational Databases

Choose between SQLite for local development or Postgres for production

Vector Stores

Configure LanceDB, PGVector, Qdrant, Redis, ChromaDB, FalkorDB, or Neptune Analytics

Graph Stores

Set up Kuzu, Neo4j, or Neptune for knowledge graph storage