Quick Start¶
What You Can Build¶
OpenJarvis is a modular AI assistant framework. Here's what developers build with it:
For complete copy-paste patterns, see Code Snippets.
This guide walks through the core workflows of OpenJarvis: the browser app, CLI, Python SDK, agents with tools, memory, benchmarks, and the API server.
Prerequisites
Make sure you have installed OpenJarvis and have at least one inference backend running (e.g., ollama serve).
Browser App¶
The quickest way to experience OpenJarvis is the full chat UI running in your browser:
This launches the backend API server and a React frontend at http://localhost:5173. You get a ChatGPT-like interface with streaming responses, tool use, energy monitoring, and a telemetry dashboard — all running locally on your hardware.
To stop all services, press Ctrl+C in the terminal.
Environment variable
Set OPENJARVIS_MODEL to change the default model: OPENJARVIS_MODEL=deepseek-r1:14b ./scripts/quickstart.sh
Initialize Configuration¶
Start by detecting your hardware and generating a configuration file:
This runs hardware auto-detection (GPU vendor, VRAM, CPU, RAM) and writes a config file to ~/.openjarvis/config.toml with sensible defaults for your system. It also selects the recommended inference engine.
Detecting hardware...
Platform : linux
CPU : AMD EPYC 7763 (128 cores)
RAM : 512.0 GB
GPU : NVIDIA A100 (80.0 GB VRAM, x8)
Config written successfully.
To overwrite an existing config:
See Configuration for the full config reference.
Your First Question¶
Via CLI¶
The simplest way to interact with OpenJarvis is the ask command:
OpenJarvis will auto-detect a running engine, select a model using the configured router policy, and return the response.
CLI Options¶
| Option | Description | Example |
|---|---|---|
-m, --model |
Override model selection | jarvis ask -m qwen3:8b "Hello" |
-e, --engine |
Force a specific engine | jarvis ask -e ollama "Hello" |
-t, --temperature |
Sampling temperature (default: 0.7) | jarvis ask -t 0.2 "Hello" |
--max-tokens |
Max tokens to generate (default: 1024) | jarvis ask --max-tokens 2048 "Hello" |
--json |
Output raw JSON result | jarvis ask --json "Hello" |
--no-stream |
Disable streaming | jarvis ask --no-stream "Hello" |
--no-context |
Disable memory context injection | jarvis ask --no-context "Hello" |
-a, --agent |
Use an agent | jarvis ask -a orchestrator "Hello" |
--tools |
Comma-separated tools | jarvis ask --tools calculator,think "2+2" |
--router |
Router policy for model selection | jarvis ask --router heuristic "Hello" |
Via Python SDK¶
The Jarvis class provides a high-level Python interface:
from openjarvis import Jarvis
j = Jarvis()
response = j.ask("What is the capital of France?")
print(response)
j.close()
For detailed results including token usage and model info:
result = j.ask_full("What is the capital of France?")
print(result["content"]) # The response text
print(result["model"]) # Model that handled the query
print(result["engine"]) # Engine that ran inference
print(result["usage"]) # Token usage statistics
SDK Constructor Options¶
# Use default config (auto-detected hardware, ~/.openjarvis/config.toml)
j = Jarvis()
# Override the model
j = Jarvis(model="qwen3:8b")
# Override the engine
j = Jarvis(engine_key="ollama")
# Use a custom config file
j = Jarvis(config_path="/path/to/config.toml")
Always call close()
The Jarvis instance holds references to telemetry stores and memory backends. Call j.close() when you are done to release resources.
Using Agents with Tools¶
Agents add multi-turn reasoning and tool-calling capabilities. The orchestrator agent runs a tool-calling loop, invoking tools as needed to answer the query.
Available Agents¶
| Agent | Description |
|---|---|
simple |
Single-turn, no tools. Sends the query directly to the model. |
orchestrator |
Multi-turn tool-calling loop. Invokes tools iteratively until it has an answer. |
custom |
Template for user-defined agent logic. |
operative |
Task-oriented agent with structured planning and execution. |
Available Built-in Tools¶
| Tool | Description |
|---|---|
calculator |
Safe mathematical expression evaluation (ast-based). |
think |
Reasoning scratchpad for chain-of-thought. |
retrieval |
Search the memory store for relevant context. |
llm |
Make sub-queries to another model. |
file_read |
Read files with path validation. |
web_search |
Web search via the Tavily API (requires tools-search extra). |
CLI Example¶
SDK Example¶
from openjarvis import Jarvis
j = Jarvis()
result = j.ask_full(
"What is the square root of 144?",
agent="orchestrator",
tools=["calculator", "think"],
)
print(result["content"])
print(result["tool_results"]) # List of tool invocations and results
print(result["turns"]) # Number of agent turns
j.close()
Memory: Indexing and Search¶
The memory system lets you index documents and inject relevant context into queries automatically.
Index Documents¶
Index a file or directory. OpenJarvis chunks the content and stores it in the configured memory backend (SQLite/FTS5 by default).
Search Memory¶
Query the memory store to find relevant chunks:
Check Memory Statistics¶
Automatic Context Injection¶
When you have indexed documents, OpenJarvis automatically injects relevant context into your queries. The memory system searches for chunks matching your query and prepends them as system context before sending to the model.
To disable this behavior:
Context injection is controlled by agent.context_from_memory in config.toml. The retrieval parameters (context_top_k, context_min_score, context_max_tokens) live under [tools.storage]. See Configuration for details.
Model Management¶
List Available Models¶
See all models available on running engines:
This produces a table showing each model, its engine, parameter count, context length, and VRAM requirements.
Get Model Details¶
Pull a Model (Ollama)¶
SDK Model Listing¶
from openjarvis import Jarvis
j = Jarvis()
models = j.list_models()
engines = j.list_engines()
print(f"Models: {models}")
print(f"Engines: {engines}")
j.close()
Running Benchmarks¶
The benchmarking framework measures inference latency and throughput against your engine.
Example output:
Running 2 benchmark(s) on ollama/qwen3:8b (10 samples)...
latency (10 samples, 0 errors)
mean_ms: 245.3200
p50_ms: 238.1000
p95_ms: 312.4500
min_ms: 201.2000
max_ms: 345.6000
throughput (10 samples, 0 errors)
tokens_per_second: 42.1500
total_tokens: 4215
total_seconds: 100.0000
Starting the API Server¶
OpenJarvis provides an OpenAI-compatible API server for integration with existing tools and frontends.
Start the Server¶
With custom options:
API Endpoints¶
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions |
POST |
Chat completions (streaming and non-streaming) |
/v1/models |
GET |
List available models |
/health |
GET |
Health check |
Use with Any OpenAI-Compatible Client¶
Once the server is running, point any OpenAI-compatible client at it:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="qwen3:8b",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Or with curl:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3:8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Telemetry¶
OpenJarvis records telemetry for every inference call (timing, tokens, cost). View aggregated statistics:
Export telemetry data:
Clear all telemetry records:
Complete Working Example¶
Here is a complete end-to-end session combining multiple features:
from openjarvis import Jarvis
# Initialize with defaults (auto-detect hardware and engine)
j = Jarvis()
# 1. Index some documentation
index_result = j.memory.index("./docs/", chunk_size=512)
print(f"Indexed {index_result['chunks']} chunks from {index_result['path']}")
# 2. Search memory
results = j.memory.search("how to configure engines")
for r in results:
print(f" [{r['score']:.3f}] {r['source']}")
# 3. Ask a question (memory context is injected automatically)
answer = j.ask("How do I configure the Ollama engine host?")
print(f"\nAnswer: {answer}")
# 4. Use an agent with tools
calc_result = j.ask_full(
"Calculate the compound interest on $10,000 at 5% for 10 years",
agent="orchestrator",
tools=["calculator", "think"],
)
print(f"\nCalculation: {calc_result['content']}")
print(f"Tools used: {[t['tool_name'] for t in calc_result['tool_results']]}")
print(f"Agent turns: {calc_result['turns']}")
# 5. List available models
models = j.list_models()
print(f"\nAvailable models: {models}")
# 6. Clean up
j.close()
Next Steps¶
- Configuration — Fine-tune engine hosts, model routing, memory settings, and more
- CLI Reference — Full reference for all CLI commands and options
- Python SDK — Detailed SDK documentation
- Architecture Overview — Understand the five-primitive design