Skip to content

Reranking Providers

Lucerna uses two stages of ranking during hybrid search:

Stage 1 — RRF fusion (always on, no config needed)

After running vector search and full-text search in parallel, LanceDB’s built-in Reciprocal Rank Fusion algorithm merges the two ranked lists into one. It’s fast, free, and requires no API key. For many codebases this is sufficient.

Stage 2 — Cross-encoder reranking (optional)

A trained ML model re-scores each candidate result against the full query text. Unlike embeddings (which compress meaning into a fixed vector), a cross-encoder sees both the query and the document together and produces a precise relevance score. This meaningfully improves precision — especially on large codebases where many chunks look similar at embedding level — at the cost of additional API calls and ~200–500 ms of extra latency per search.


lucerna.config.ts
import { defineConfig } from "@upstart.gg/lucerna";
export default defineConfig({
embedding: { provider: "voyage", model: "voyage-code-3", apiKey: process.env.VOYAGE_API_KEY! },
reranking: { provider: "voyage", model: "rerank-2.5", apiKey: process.env.VOYAGE_API_KEY! },
});
ModelPrice per 1K queriesNotes
rerank-2.5Latest generation — top pick
rerank-2$0.05Previous generation, 16K ctx per pair
rerank-2-lite$0.02Lower latency

1 search = 1 query against up to 100 documents.

lucerna.config.ts
import { defineConfig } from "@upstart.gg/lucerna";
export default defineConfig({
reranking: { provider: "cohere", model: "rerank-english-v3.0", apiKey: process.env.COHERE_API_KEY! },
});
ModelPrice per 1K searchesNotes
rerank-english-v3.0$2.00Strong English reranker
rerank-multilingual-v3.0$2.00Multilingual

First 10M tokens free per API key · Token-based billing.

lucerna.config.ts
import { defineConfig } from "@upstart.gg/lucerna";
export default defineConfig({
reranking: { provider: "jina", model: "jina-reranker-v2-base-multilingual", apiKey: process.env.JINA_API_KEY! },
});
ModelPrice per 1M tokensNotes
jina-reranker-v2-base-multilingual~$0.02Fast, multilingual, good for code

Proper cross-encoder using Google’s dedicated Ranking API. No data store or app needed — just a GCP project with the Discovery Engine API enabled.

Authentication uses Application Default Credentials (ADC) — same as for the embedding provider. Run gcloud auth application-default login once for local dev, or set GOOGLE_APPLICATION_CREDENTIALS for CI/CD.

lucerna.config.ts
import { defineConfig } from "@upstart.gg/lucerna";
export default defineConfig({
reranking: {
provider: "vertex",
model: "semantic-ranker-default-004",
project: process.env.GOOGLE_CLOUD_PROJECT!,
// keyFile: process.env.GOOGLE_APPLICATION_CREDENTIALS, // explicit service account key (optional)
},
});
ModelPrice per 1K queriesNotes
semantic-ranker-default-004$1.00Recommended, 1024 token ctx per doc
semantic-ranker-fast-004$1.00Lower latency

1 query = up to 100 documents; every additional 100 docs counts as +1 query.


Prompt-based relevance scoring using a Gemini generative model. Not a cross-encoder — uses a Gemini API key, no GCP project required.

lucerna.config.ts
import { defineConfig } from "@upstart.gg/lucerna";
export default defineConfig({
reranking: { provider: "gemini", model: "gemini-2.5-flash-lite", apiKey: process.env.GOOGLE_API_KEY! },
});
ModelPrice per 100 docsNotes
gemini-2.5-flash-lite~$0.0002Default; good balance of quality and cost
gemini-2.0-flash-lite~$0.0002Previous default

Free tier: 10K neurons/day (shared with embeddings).

lucerna.config.ts
import { defineConfig } from "@upstart.gg/lucerna";
export default defineConfig({
reranking: {
provider: "cloudflare",
model: "@cf/baai/bge-reranker-base",
accountId: process.env.CLOUDFLARE_ACCOUNT_ID!,
apiKey: process.env.CLOUDFLARE_API_TOKEN!,
},
});
ModelPrice per 1M tokensNotes
@cf/baai/bge-reranker-base~$0.003 (free tier applies)No extra key if already using Cloudflare