Skip to content

Introduction

Lucerna is an AST-aware semantic + lexical code indexer designed as middleware for AI coding agents. It combines:

  • Vector search — HuggingFace Transformers embeddings via @huggingface/transformers
  • BM25 full-text search — code-aware tokenization that splits identifiers on camelCase/underscores
  • Knowledge graph — symbol relationships: calls, implements, extends, uses, imports

All three are stored in a single embedded LanceDB database at <project-root>/.lucerna/. There are no external services, no servers, no configuration files — just the library or CLI.

Parsing is done by tree-sitter via @kreuzberg/tree-sitter-language-pack. Lucerna ships custom AST-aware chunkers for popular languages — files in any other language are not indexed. .gitignore files at any depth are always respected.

  • AST-based chunking — extracts functions, classes, methods, interfaces, type aliases, and heading sections rather than arbitrary line ranges
  • Hybrid search — combines semantic (vector) and lexical (BM25 full-text) search via Reciprocal Rank Fusion
  • Optional reranking — second-stage cross-encoder reranking to improve precision after RRF fusion
  • Knowledge graph — AST-extracted call, import, and inheritance edges stored in a persisted graph; traverse callers, callees, and dependencies, or expand search results with graph context
  • Repo map — aider-style concise listing of all indexed files and their top-level symbols
  • Recall evaluation — built-in eval command measures recall@k against a JSONL query set
  • Fully embedded — uses LanceDB; the index is a directory on disk, one per project
  • Multi-project — multiple CodeIndexer instances in the same process, each fully isolated
  • File watching — debounced incremental re-indexing via chokidar; watcher path uses an in-memory chunk cache (no full DB scan per file change)
  • Pluggable embeddings — local (HFEmbeddings, BGESmallEmbeddings, NomicCodeEmbeddings) or remote (CloudflareEmbeddings); swap or disable entirely
  • Popular languages — custom AST-aware chunkers for Python, Java, Go, Rust, TypeScript/JavaScript, C/C++, C#, Swift, Kotlin, Ruby, PHP, and more; see language support
  • Gitignore-aware.gitignore files at any depth are always respected during indexing and watching
  • CLIlucerna index / watch / search / graph / stats / clear / eval