retriever
retriever
¶
TwoStageRetriever — BM25 recall from KnowledgeStore + optional reranking.
Composes a fast BM25 first-stage recall (via KnowledgeStore) with an
optional second-stage Reranker for semantic reordering. The default
ColBERTReranker lazy-loads a ColBERT checkpoint and scores candidates via
MaxSim; it degrades gracefully when colbert-ai is not installed.
Typical usage::
store = KnowledgeStore()
retriever = TwoStageRetriever(store, reranker=ColBERTReranker())
results = retriever.retrieve("neural networks", top_k=10)
Classes¶
Reranker
¶
Bases: ABC
Abstract base class for second-stage semantic rerankers.
Functions¶
rerank
abstractmethod
¶
rerank(query: str, candidates: List[RetrievalResult], *, top_k: int = 10) -> List[RetrievalResult]
Rerank candidates for query and return the top top_k results.
| PARAMETER | DESCRIPTION |
|---|---|
query
|
The original search query.
TYPE:
|
candidates
|
List of
TYPE:
|
top_k
|
Maximum number of results to return.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[RetrievalResult]
|
Reranked results, truncated to top_k. |
Source code in src/openjarvis/connectors/retriever.py
ColBERTReranker
¶
ColBERTReranker(checkpoint: str = 'colbert-ir/colbertv2.0', embedding_store: Optional['EmbeddingStore'] = None)
Bases: Reranker
Semantic reranker backed by ColBERT MaxSim scoring.
Lazy-loads a ColBERT checkpoint on first use. If the colbert-ai
package is not installed the reranker falls back to returning the
BM25-ordered candidates unchanged (with a warning logged once).
| PARAMETER | DESCRIPTION |
|---|---|
checkpoint
|
Path or HuggingFace model ID for the ColBERT checkpoint.
Defaults to
TYPE:
|
embedding_store
|
Optional :class:
TYPE:
|
Source code in src/openjarvis/connectors/retriever.py
Functions¶
rerank
¶
rerank(query: str, candidates: List[RetrievalResult], *, top_k: int = 10) -> List[RetrievalResult]
Rerank candidates using ColBERT MaxSim scores.
Falls back to BM25 order if colbert-ai is unavailable.
Source code in src/openjarvis/connectors/retriever.py
TwoStageRetriever
¶
TwoStageRetriever(store: KnowledgeStore, reranker: Optional[Reranker] = None, *, recall_k: int = 100)
BM25 recall + optional semantic reranking for Deep Research.
Stage 1 retrieves max(recall_k, top_k * 3) candidates from the
KnowledgeStore using FTS5/BM25. Stage 2 optionally passes those
candidates through a Reranker to produce a semantically ordered
final list of top_k results.
| PARAMETER | DESCRIPTION |
|---|---|
store
|
The
TYPE:
|
reranker
|
An optional
TYPE:
|
recall_k
|
Number of candidates to fetch in Stage 1. The actual recall
size is
TYPE:
|
Source code in src/openjarvis/connectors/retriever.py
Functions¶
retrieve
¶
retrieve(query: str, *, top_k: int = 10, source: str = '', doc_type: str = '', author: str = '', since: str = '', until: str = '') -> List[RetrievalResult]
Run the two-stage retrieval pipeline.
| PARAMETER | DESCRIPTION |
|---|---|
query
|
Full-text search query.
TYPE:
|
top_k
|
Maximum number of results to return.
TYPE:
|
source
|
Restrict to chunks from this source (e.g.
TYPE:
|
doc_type
|
Restrict to chunks of this doc type (e.g.
TYPE:
|
author
|
Restrict to chunks authored by this person.
TYPE:
|
since
|
Exclude chunks whose timestamp is earlier than this ISO string.
TYPE:
|
until
|
Exclude chunks whose timestamp is later than this ISO string.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
List[RetrievalResult]
|
Up to top_k results, reranked when a reranker is configured. |