Introduction

Lucerna is an AST-aware semantic + lexical code indexer designed as middleware for AI coding agents. It combines:

Vector search — remote embedding providers (Voyage, OpenAI, Cohere, Gemini, Mistral, Ollama, etc.)
BM25 full-text search — code-aware tokenization that splits identifiers on camelCase/underscores
Knowledge graph — symbol relationships: calls, implements, extends, uses, imports

All three are stored in a single embedded LanceDB database at <project-root>/.lucerna/. Parsing is done by tree-sitter via @kreuzberg/tree-sitter-language-pack. Lucerna ships custom AST-aware chunkers for popular languages — files in other languages are not indexed. .gitignore files at any depth are always respected.

Features

AST-based chunking — extracts functions, classes, methods, interfaces, type aliases, and heading sections rather than arbitrary line ranges
Hybrid search — combines semantic (vector) and lexical (BM25 full-text) search via Reciprocal Rank Fusion
Optional reranking — second-stage cross-encoder reranking to improve precision after RRF fusion
Knowledge graph — AST-extracted call, import, and inheritance edges stored in a persisted graph; traverse callers, callees, and dependencies, or expand search results with graph context
Repo map — aider-style concise listing of all indexed files and their top-level symbols
Recall evaluation — built-in eval command measures recall@k against a JSONL query set
Fully embedded — uses LanceDB; the index is a directory on disk, one per project
Multi-project — multiple CodeIndexer instances in the same process, each fully isolated
File watching — debounced incremental re-indexing via chokidar
Pluggable embeddings — remote API providers (Voyage, OpenAI, Cohere, Jina, Mistral, Gemini, Vertex AI, Cloudflare, Ollama); or disable entirely for BM25-only mode
Popular languages — custom AST-aware chunkers for Python, Java, Go, Rust, TypeScript/JavaScript, C/C++, C#, Swift, Kotlin, Ruby, PHP, and more; see language support
Gitignore-aware — .gitignore files at any depth are always respected during indexing and watching
CLI — lucerna index / watch / search / graph / stats / clear / eval
MCP server — built-in Model Context Protocol server for AI coding assistants