Introduction
Lucerna is an AST-aware semantic + lexical code indexer designed as middleware for AI coding agents. It combines:
- Vector search — HuggingFace Transformers embeddings via
@huggingface/transformers - BM25 full-text search — code-aware tokenization that splits identifiers on camelCase/underscores
- Knowledge graph — symbol relationships: calls, implements, extends, uses, imports
All three are stored in a single embedded LanceDB database at <project-root>/.lucerna/. There are no external services, no servers, no configuration files — just the library or CLI.
Parsing is done by tree-sitter via @kreuzberg/tree-sitter-language-pack. Lucerna ships custom AST-aware chunkers for popular languages — files in any other language are not indexed. .gitignore files at any depth are always respected.
Features
Section titled “Features”- AST-based chunking — extracts functions, classes, methods, interfaces, type aliases, and heading sections rather than arbitrary line ranges
- Hybrid search — combines semantic (vector) and lexical (BM25 full-text) search via Reciprocal Rank Fusion
- Optional reranking — second-stage cross-encoder reranking to improve precision after RRF fusion
- Knowledge graph — AST-extracted call, import, and inheritance edges stored in a persisted graph; traverse callers, callees, and dependencies, or expand search results with graph context
- Repo map — aider-style concise listing of all indexed files and their top-level symbols
- Recall evaluation — built-in
evalcommand measures recall@k against a JSONL query set - Fully embedded — uses LanceDB; the index is a directory on disk, one per project
- Multi-project — multiple
CodeIndexerinstances in the same process, each fully isolated - File watching — debounced incremental re-indexing via chokidar; watcher path uses an in-memory chunk cache (no full DB scan per file change)
- Pluggable embeddings — local (
HFEmbeddings,BGESmallEmbeddings,NomicCodeEmbeddings) or remote (CloudflareEmbeddings); swap or disable entirely - Popular languages — custom AST-aware chunkers for Python, Java, Go, Rust, TypeScript/JavaScript, C/C++, C#, Swift, Kotlin, Ruby, PHP, and more; see language support
- Gitignore-aware —
.gitignorefiles at any depth are always respected during indexing and watching - CLI —
lucerna index / watch / search / graph / stats / clear / eval