Memory¶
The memory system provides persistent, searchable document storage for retrieval-augmented generation (RAG). It supports multiple retrieval backends, a configurable chunking pipeline, document ingestion from files and directories, and automatic context injection into prompts.
Architecture¶
Documents --> Chunking Pipeline --> Memory Backend --> Context Injection --> Prompt
(files) (split + overlap) (store + index) (retrieve + format) (to LLM)
MemoryBackend ABC¶
All memory backends implement the MemoryBackend abstract base class.
class MemoryBackend(ABC):
backend_id: str
def store(self, content: str, *, source: str = "", metadata: dict | None = None) -> str:
"""Persist content and return a unique document ID."""
def retrieve(self, query: str, *, top_k: int = 5, **kwargs) -> list[RetrievalResult]:
"""Search for query and return the top-k results."""
def delete(self, doc_id: str) -> bool:
"""Delete a document by ID. Return True if it existed."""
def clear(self) -> None:
"""Remove all stored documents."""
RetrievalResult¶
Each retrieval returns a list of RetrievalResult objects:
| Field | Type | Description |
|---|---|---|
content |
str |
The retrieved text chunk |
score |
float |
Relevance score (higher is better) |
source |
str |
Originating file path or identifier |
metadata |
dict[str, Any] |
Additional metadata (chunk index, etc.) |
Backends¶
SQLite / FTS5 (Default)¶
Registry key: sqlite
The default backend using SQLite's built-in FTS5 full-text search extension. Zero external dependencies -- uses Python's standard sqlite3 module.
- Scoring: BM25 ranking via FTS5 MATCH queries
- Persistence: SQLite database file (default:
~/.openjarvis/memory.db) - Dependencies: None (built into Python)
from openjarvis.core.registry import MemoryRegistry
backend = MemoryRegistry.create("sqlite", db_path="./memory.db")
doc_id = backend.store("Hello world", source="test.txt")
results = backend.retrieve("hello")
backend.close()
When to use SQLite/FTS5
Use this backend when you want zero-configuration setup, keyword-based search is sufficient, and you need persistent storage across restarts. It works well for small to medium document collections.
FAISS¶
Registry key: faiss
Dense neural retrieval using Facebook AI Similarity Search. Embeds documents and queries into dense vectors and retrieves by cosine similarity.
- Scoring: Cosine similarity via inner-product search on L2-normalized vectors
- Persistence: In-memory only (data is lost on restart)
- Dependencies:
faiss-cpu(orfaiss-gpu),sentence-transformers
backend = MemoryRegistry.create("faiss")
doc_id = backend.store("Neural networks are computational models")
results = backend.retrieve("deep learning architectures")
When to use FAISS
Use this backend when you need semantic search (finding conceptually similar content even without exact keyword matches). Best for use cases where you can re-index on each run since data is not persisted.
ColBERTv2¶
Registry key: colbert
Late-interaction retrieval using ColBERT's token-level embeddings with MaxSim scoring. Provides the highest retrieval quality among the available backends.
- Scoring: MaxSim -- for each query token, take the maximum cosine similarity across all document tokens, then sum
- Persistence: In-memory only
- Dependencies:
colbert-ai,torch
| Parameter | Default | Description |
|---|---|---|
checkpoint |
"colbert-ir/colbertv2.0" |
ColBERT model checkpoint |
device |
"cpu" |
Computation device (cpu or cuda) |
When to use ColBERTv2
Use this backend when retrieval quality is the top priority and you have the compute resources for it. The checkpoint is lazily loaded on first use to avoid slow imports. Best for research and evaluation workloads.
BM25¶
Registry key: bm25
Classic probabilistic ranking using the BM25 Okapi algorithm. In-memory implementation using the rank_bm25 library.
- Scoring: BM25 Okapi term-frequency scoring
- Persistence: In-memory only
- Dependencies:
rank-bm25
backend = MemoryRegistry.create("bm25")
backend.store("Python is a programming language", source="intro.txt")
results = backend.retrieve("programming language")
When to use BM25
Use this backend when you want classic keyword-based retrieval without database dependencies. Useful as the sparse component in a hybrid retrieval setup.
Hybrid (RRF Fusion)¶
Registry key: hybrid
Combines a sparse retriever and a dense retriever using Reciprocal Rank Fusion (RRF). Documents are stored in both sub-backends, and retrieval results are merged.
- Scoring:
RRF_score(d) = sum(weight_i / (k + rank_i(d)))across both ranked lists - Persistence: Depends on sub-backends
- Dependencies: Depends on sub-backends
from openjarvis.tools.storage.bm25 import BM25Memory
from openjarvis.tools.storage.faiss_backend import FAISSMemory
sparse = BM25Memory()
dense = FAISSMemory()
backend = MemoryRegistry.create(
"hybrid",
sparse=sparse,
dense=dense,
k=60,
sparse_weight=1.0,
dense_weight=1.0,
)
Backward compatibility
The old from openjarvis.memory.bm25 import BM25Memory still works via backward-compatibility shims, but new code should use the canonical openjarvis.tools.storage.* imports.
| Parameter | Default | Description |
|---|---|---|
sparse |
-- | Sparse retrieval backend (e.g., BM25) |
dense |
-- | Dense retrieval backend (e.g., FAISS) |
k |
60 |
RRF constant |
sparse_weight |
1.0 |
Weight for sparse retriever results |
dense_weight |
1.0 |
Weight for dense retriever results |
The hybrid backend over-fetches (3x top_k) from each sub-backend before applying fusion to improve result quality.
When to use Hybrid
Use this backend when you want the best of both keyword matching and semantic similarity. The RRF fusion approach is robust and does not require tuning score distributions across different retrieval methods.
Backend Comparison¶
| Backend | Search Type | Persistence | Dependencies | Quality | Speed |
|---|---|---|---|---|---|
| SQLite/FTS5 | Keyword (BM25) | Yes | None | Good | Fast |
| FAISS | Dense (cosine) | No | faiss, transformers | Better | Fast |
| ColBERTv2 | Late interaction | No | colbert-ai, torch | Best | Slower |
| BM25 | Keyword (Okapi) | No | rank-bm25 | Good | Fast |
| Hybrid | Fusion (RRF) | Mixed | Sub-backend deps | Better | Medium |
Chunking Pipeline¶
Documents are split into chunks before storage using a configurable pipeline. The chunker respects paragraph boundaries when possible.
ChunkConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
chunk_size |
int |
512 |
Target chunk size in whitespace tokens |
chunk_overlap |
int |
64 |
Overlap between consecutive chunks |
min_chunk_size |
int |
50 |
Minimum chunk size (smaller chunks are discarded) |
How Chunking Works¶
- The document is split into paragraphs (separated by double newlines).
- Paragraphs are accumulated until the token count exceeds
chunk_size. - The accumulated content is emitted as a chunk.
- The last
chunk_overlaptokens are retained as context for the next chunk. - Paragraphs exceeding
chunk_sizeare split into fixed-size windows with overlap.
Chunk Output¶
Each chunk is a Chunk object with:
| Field | Type | Description |
|---|---|---|
content |
str |
The chunk text |
source |
str |
Originating file path |
offset |
int |
Token offset within the document |
index |
int |
Sequential chunk index |
metadata |
dict[str, Any] |
Additional metadata |
Document Ingestion¶
The ingest_path() function reads files or recursively walks directories, producing chunks ready for storage.
Supported File Types¶
| Type | Extensions |
|---|---|
| Text | .txt and other plain text files |
| Markdown | .md, .markdown, .mdx |
| Code | .py, .js, .ts, .rs, .go, .java, .c, .cpp, .rb, .sh, .yaml, .json, .html, .css, and more |
.pdf (requires pdfplumber: uv sync --extra memory-pdf) |
Automatic Skipping¶
The ingestion pipeline automatically skips:
- Hidden files and directories (starting with
.) - Common non-content directories:
__pycache__,node_modules,.venv,.git, etc. - Binary files: images, audio, video, archives, compiled files
- Files that cannot be read (permission errors, encoding issues)
Usage¶
from pathlib import Path
from openjarvis.tools.storage.chunking import ChunkConfig
from openjarvis.tools.storage.ingest import ingest_path
# Default chunking
chunks = ingest_path(Path("./docs/"))
# Custom chunking
config = ChunkConfig(chunk_size=256, chunk_overlap=32)
chunks = ingest_path(Path("./notes.md"), config=config)
print(f"Produced {len(chunks)} chunks")
for chunk in chunks[:3]:
print(f" [{chunk.index}] {chunk.source}: {chunk.content[:60]}...")
Context Injection¶
When memory context injection is enabled (the default), queries are automatically augmented with relevant retrieved documents before being sent to the model. Each retrieved passage includes source attribution.
ContextConfig¶
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool |
True |
Whether context injection is active |
top_k |
int |
5 |
Number of results to retrieve |
min_score |
float |
0.1 |
Minimum relevance score threshold |
max_context_tokens |
int |
2048 |
Maximum total tokens in injected context |
How It Works¶
- The user's query is searched against the memory backend.
- Results below
min_scoreare filtered out. - Results are truncated to fit within
max_context_tokens. - A system message is prepended to the conversation with the formatted context:
The following context was retrieved from the knowledge base.
Use it to inform your response, citing sources where applicable:
[Source: docs/intro.md] OpenJarvis is a modular AI framework...
[Source: docs/config.md] Configuration is stored in TOML format...
Disabling Context Injection¶
CLI Usage¶
# Index a directory
jarvis memory index ./docs/
# Index with custom chunking
jarvis memory index ./notes/ --chunk-size 256 --chunk-overlap 32
# Search the memory store
jarvis memory search "machine learning"
# Search with more results
jarvis memory search -k 10 "neural networks"
# Show memory statistics
jarvis memory stats
SDK Usage¶
from openjarvis import Jarvis
j = Jarvis()
# Index documents
result = j.memory.index("./docs/", chunk_size=512, chunk_overlap=64)
print(f"Indexed {result['chunks']} chunks")
# Search
results = j.memory.search("configuration", top_k=3)
for r in results:
print(f" [{r['score']:.4f}] {r['source']}: {r['content'][:80]}...")
# Statistics
stats = j.memory.stats()
print(f"Backend: {stats['backend']}, Documents: {stats.get('count', 'N/A')}")
# Clean up
j.close()