Search Modes
| Mode | When used |
|---|---|
| Hybrid (default) | Embedding function available — runs vector search and BM25, fuses via Reciprocal Rank Fusion |
| Semantic only | searchSemantic() |
| Lexical only | searchLexical() or embeddingFunction: false |
| Graph-context | searchWithContext() — hybrid search + BFS graph expansion |
Hybrid search
Section titled “Hybrid search”Both vector search (top-K by cosine similarity) and BM25 (top-K by text relevance) run in parallel. Results are merged with Reciprocal Rank Fusion:
score = Σ 1 / (k + rank)Default k=45, tuned for code retrieval. Results can then be optionally re-ranked by a cross-encoder.
Code-aware BM25
Section titled “Code-aware BM25”Identifiers are expanded before indexing — getUserById becomes getUserById get user by id, parse_json becomes parse_json parse json — so BM25 matches sub-word tokens without sacrificing exact-identifier recall.
Graph-context expansion
Section titled “Graph-context expansion”searchWithContext() runs hybrid search and then expands the result set by following knowledge graph edges (BFS up to graphDepth hops). Direct search hits come first (in rank order); graph-expanded neighbour chunks follow in traversal order. Duplicates are collapsed.
Result ordering
Section titled “Result ordering”Results are always returned sorted by relevance (best first). The internal ordering score is not exposed on the result type — it is not comparable across paths (semantic cosine similarity, BM25 rank, RRF fused score, reranker output all live on different scales). Callers should rely on the result order; the matchType field ('semantic' | 'lexical' | 'hybrid') indicates which path produced each result.