Design Principles¶
OpenJarvis follows a set of design principles that guide every architectural decision. These principles ensure the framework remains extensible, portable, and easy to work with.
1. Pluggable Everything¶
Every major component in OpenJarvis is defined as an abstract base class (ABC) with concrete implementations registered at runtime. This means you can swap, extend, or replace any part of the system without modifying existing code.
graph LR
subgraph "ABC Interface"
ABC["InferenceEngine ABC<br/><code>generate(), stream(),<br/>list_models(), health()</code>"]
end
subgraph "Implementations"
A["OllamaEngine"]
B["VLLMEngine"]
C["SGLangEngine"]
D["LlamaCppEngine"]
E["CloudEngine"]
F["YourCustomEngine"]
end
ABC --> A
ABC --> B
ABC --> C
ABC --> D
ABC --> E
ABC -.->|"extend"| F
This pattern applies across all five primitives:
| Primitive | ABC | Implementations |
|---|---|---|
| Engine | InferenceEngine |
Ollama, vLLM, SGLang, llama.cpp, Cloud |
| Memory | MemoryBackend |
SQLite, FAISS, ColBERT, BM25, Hybrid |
| Agents | BaseAgent |
Simple, Orchestrator, NativeReAct, NativeOpenHands, RLM, OpenHands, ClaudeCode, Operative, MonitorOperative |
| Learning | RouterPolicy |
Heuristic, TraceDriven, GRPO |
| Tools | BaseTool |
Calculator, Think, Retrieval, LLM, FileRead |
Adding a new implementation requires two things: implement the ABC and register it. The rest of the system discovers and uses it automatically.
2. Registry-Driven¶
All extensible components use the @XRegistry.register("name") decorator pattern. Registration happens at import time, and no factory function or configuration file needs modification.
from openjarvis.core.registry import EngineRegistry
from openjarvis.engine._stubs import InferenceEngine
@EngineRegistry.register("my-engine")
class MyEngine(InferenceEngine):
engine_id = "my-engine"
def generate(self, messages, *, model, **kwargs):
...
def stream(self, messages, *, model, **kwargs):
...
def list_models(self):
...
def health(self):
...
The RegistryBase[T] generic base class provides:
- Class-specific isolation -- Each typed subclass (
EngineRegistry,MemoryRegistry, etc.) has its own entry storage, so registrations never leak between registries - Duplicate detection -- Registering the same key twice raises
ValueError - Runtime instantiation --
Registry.create(key, *args)looks up and instantiates in one step - Introspection --
keys(),items(),contains()for discovering available components
Why decorators instead of configuration files?
The decorator pattern means that adding a new component is a single-file change. There is no central registry file to edit, no YAML to update, and no factory to modify. The component self-registers simply by being imported.
3. Offline-First¶
OpenJarvis is designed to work entirely without network access. All core functionality -- inference, memory, agents, tools, telemetry -- operates locally. Cloud APIs are optional extensions, never requirements.
| Feature | Offline Behavior |
|---|---|
| Inference | Ollama, vLLM, SGLang, llama.cpp all run locally |
| Memory | SQLite/FTS5 uses built-in Python sqlite3 module |
| Embeddings | sentence-transformers models run locally |
| Telemetry | SQLite-based, fully local |
| Traces | SQLite-based, fully local |
| Tools | Calculator, Think, FileRead all local |
| Configuration | TOML file on disk |
Cloud engines (OpenAI, Anthropic, Google) are available through the optional cloud backend, but they are:
- Only registered if the corresponding SDK packages are installed
- Only activated if API keys are set as environment variables
- Never required for any core functionality
# This works without any network connection
from openjarvis import Jarvis
j = Jarvis(engine_key="ollama") # Local Ollama server
response = j.ask("Hello")
4. Hardware-Aware¶
OpenJarvis auto-detects system hardware at startup and recommends the optimal inference engine. The detect_hardware() function probes:
| Hardware | Detection Method |
|---|---|
| NVIDIA GPUs | nvidia-smi (name, VRAM, count) |
| AMD GPUs | rocm-smi (product name) |
| Apple Silicon | system_profiler SPDisplaysDataType |
| CPU | /proc/cpuinfo or sysctl (brand string) |
| RAM | /proc/meminfo or sysctl hw.memsize |
The recommend_engine() function maps hardware to engines:
| Hardware | Recommended Engine |
|---|---|
| No GPU | llamacpp (CPU-optimized) |
| Apple Silicon | ollama (Metal acceleration) |
| NVIDIA datacenter (A100, H100, etc.) | vllm (high throughput) |
| NVIDIA consumer | ollama (easy setup) |
| AMD GPU | vllm (ROCm support) |
This recommendation is written to config.toml during jarvis init and used as the default engine:
jarvis init --force
# Detects hardware, writes ~/.openjarvis/config.toml with:
# [engine]
# default = "vllm" # (for A100)
5. Telemetry-Native¶
Every inference call automatically records timing, token counts, energy usage, and cost to a local SQLite database. Telemetry is a first-class concern, not an afterthought.
@dataclass(slots=True)
class TelemetryRecord:
timestamp: float
model_id: str
prompt_tokens: int
completion_tokens: int
total_tokens: int
latency_seconds: float
ttft: float # Time to first token
cost_usd: float
energy_joules: float
power_watts: float
engine: str
agent: str
The instrumented_generate() wrapper handles all telemetry transparently:
- Records start time
- Calls the engine's
generate()method - Records end time and extracts token counts
- Publishes a
TELEMETRY_RECORDevent on the EventBus - The
TelemetryStore(subscribed to the bus) persists the record
The TelemetryAggregator provides read-only queries over stored records:
Telemetry is best-effort
If telemetry setup fails (e.g., database is locked), the system continues without telemetry rather than raising an error. Telemetry never blocks the query flow.
6. Python-First¶
OpenJarvis provides a clean Python API through the Jarvis class. There is no framework lock-in -- the SDK is a standard Python package with dataclass-based types and no required web framework.
from openjarvis import Jarvis
j = Jarvis()
response = j.ask("Hello")
# Full control
result = j.ask_full(
"Explain quantum computing",
model="qwen3:8b",
agent="orchestrator",
tools=["think"],
temperature=0.5,
max_tokens=2048,
)
# Memory operations
j.memory.index("./docs/")
results = j.memory.search("quantum computing")
# Resource cleanup
j.close()
Design choices that support this principle:
- Dataclasses for all structured types (
Message,ModelSpec,Trace, etc.) - Type hints throughout the codebase
- No magic -- explicit initialization, clear method signatures
- Optional dependencies via extras (
openjarvis[server],openjarvis[memory-colbert], etc.) - Standard packaging with
hatchlingbuild backend anduvpackage manager
7. OpenAI-Compatible¶
The API server (jarvis serve) implements the OpenAI chat completions API format, making OpenJarvis a drop-in replacement for OpenAI in existing applications.
Supported endpoints:
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST | Chat completions (streaming and non-streaming) |
/v1/models |
GET | List available models |
/health |
GET | Health check |
Request and response formats match the OpenAI API specification:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3:8b",
"messages": [{"role": "user", "content": "Hello"}],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false
}'
Streaming responses use Server-Sent Events (SSE) with data: [DONE] termination, matching the OpenAI streaming protocol.
Any OpenAI client library can connect to OpenJarvis:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="qwen3:8b",
messages=[{"role": "user", "content": "Hello"}],
)
8. Standalone¶
OpenJarvis requires no external services for core functionality. Everything needed to run the system is included or uses standard system libraries.
| Component | Dependency |
|---|---|
| Configuration | TOML file, built-in tomllib (Python 3.11+) or tomli |
| Memory (default) | Built-in sqlite3 module |
| Telemetry | Built-in sqlite3 module |
| Traces | Built-in sqlite3 module |
| HTTP client | httpx (lightweight, pure Python) |
| CLI | click + rich |
| Event bus | Built-in threading module |
The only external requirement is a running inference engine (Ollama, vLLM, etc.), which is the model server itself -- not a dependency of OpenJarvis.
Optional features that require additional packages:
| Feature | Extra | Packages |
|---|---|---|
| FAISS memory | openjarvis[memory-faiss] |
faiss-cpu, sentence-transformers |
| ColBERT memory | openjarvis[memory-colbert] |
colbert-ai, torch |
| BM25 memory | openjarvis[memory-bm25] |
rank-bm25 |
| API server | openjarvis[server] |
fastapi, uvicorn |
| Cloud inference | openjarvis[inference-cloud] |
openai, anthropic, google-genai |
| vLLM engine | openjarvis[inference-vllm] |
vllm |
| PDF ingestion | openjarvis[memory-pdf] |
pdfplumber |
| WhatsApp Baileys | openjarvis[channel-whatsapp-baileys] |
Node.js 22+ |
This design ensures that a minimal installation (uv sync) gives you a fully functional system with SQLite memory, local inference, and the complete CLI -- no Docker, no external databases, no cloud accounts required.