CLI Reference¶
OpenJarvis provides a command-line interface through the jarvis command. Built on Click, it offers subcommands for querying models, managing memory, running benchmarks, and serving an OpenAI-compatible API.
Global Options¶
jarvis --version # Print the OpenJarvis version
jarvis --help # Show top-level help with all subcommands
jarvis init¶
Detect local hardware (CPU, GPU, RAM) and generate a configuration file at ~/.openjarvis/config.toml.
jarvis init # Interactive — refuses to overwrite existing config
jarvis init --force # Overwrite existing config without prompting
| Option | Description |
|---|---|
--force |
Overwrite existing configuration without prompting |
The init command auto-detects:
- Platform (Linux, macOS, Windows)
- CPU brand and core count
- RAM in GB
- GPU vendor, model, VRAM, and count (via
nvidia-smi,rocm-smi, orsystem_profiler)
Based on the detected hardware, it recommends an appropriate inference engine and writes a pre-configured TOML file.
Example output:
Detecting hardware...
Platform : linux
CPU : AMD Ryzen 9 7950X (32 cores)
RAM : 64 GB
GPU : NVIDIA RTX 4090 (24.0 GB VRAM, x1)
Config written successfully.
jarvis ask¶
Send a query to the inference engine (directly or through an agent) and print the response.
Options¶
| Option | Type | Default | Description |
|---|---|---|---|
-m, --model MODEL |
string | auto | Model to use for inference |
-e, --engine ENGINE |
string | auto | Engine backend (ollama, vllm, llamacpp, etc.) |
-t, --temperature TEMP |
float | 0.7 |
Sampling temperature |
--max-tokens N |
int | 1024 |
Maximum tokens to generate |
--json |
flag | off | Output raw JSON result instead of plain text |
--no-stream |
flag | off | Disable streaming (synchronous mode) |
--no-context |
flag | off | Disable memory context injection |
-a, --agent AGENT |
string | none | Agent to use (simple, orchestrator) |
--tools TOOLS |
string | none | Comma-separated tool names to enable |
Direct Mode vs Agent Mode¶
Direct mode (default) sends the query straight to the inference engine:
Agent mode routes the query through an agent that can use tools and manage multi-turn interactions:
jarvis ask --agent orchestrator "What is 2+2?"
jarvis ask --agent orchestrator --tools calculator,think "Calculate sqrt(144) + 3^2"
jarvis ask --agent simple "Hello"
Usage Examples¶
# Basic query
jarvis ask "What is machine learning?"
# Specify a model
jarvis ask -m qwen3:8b "Summarize this concept"
# Use the orchestrator agent with tools
jarvis ask --agent orchestrator --tools calculator "What is 15% of 340?"
# Get JSON output
jarvis ask --json "Hello"
# Disable memory context injection
jarvis ask --no-context "Tell me about Python"
# Set maximum token generation
jarvis ask --max-tokens 2048 "Write a detailed essay about AI"
JSON Output Format¶
When using --json in direct mode, the output includes:
{
"content": "The response text...",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 85,
"total_tokens": 97
}
}
When using --json in agent mode, the output includes:
{
"content": "The response text...",
"turns": 3,
"tool_results": [
{
"tool_name": "calculator",
"content": "51.0",
"success": true
}
]
}
jarvis model¶
Manage and inspect language models available on running engines.
jarvis model list¶
List all models available from running inference engines, displayed as a Rich table with model parameters, context length, and VRAM requirements.
Example output:
Available Models
┌─────────┬────────────────┬────────┬─────────┬──────┐
│ Engine │ Model │ Params │ Context │ VRAM │
├─────────┼────────────────┼────────┼─────────┼──────┤
│ ollama │ qwen3:8b │ 8B │ 32,768 │ 6GB │
│ ollama │ llama3.2:3b │ 3B │ 8,192 │ 3GB │
└─────────┴────────────────┴────────┴─────────┴──────┘
jarvis model info <model>¶
Show detailed information about a specific model.
Example output:
┌─ Qwen 3 8B ──────────────────────────────┐
│ Model ID: qwen3:8b │
│ Name: Qwen 3 8B │
│ Parameters: 8B │
│ Context: 32,768 │
│ Quantization: none │
│ Min VRAM: 6GB │
│ Engines: ollama, vllm │
│ Provider: Alibaba │
│ API Key: not required │
└───────────────────────────────────────────┘
jarvis model pull <model>¶
Download a model via Ollama. Shows a progress bar during download.
Note
The pull command requires a running Ollama instance. It connects to the Ollama API at the host configured in your config.toml.
jarvis memory¶
Manage the document memory store for retrieval-augmented generation.
jarvis memory index <path>¶
Index documents from a file or directory into the memory store.
jarvis memory index ./docs/
jarvis memory index ./notes.md
jarvis memory index ./data/ --chunk-size 256 --chunk-overlap 32
jarvis memory index ./docs/ --backend sqlite
| Option | Type | Default | Description |
|---|---|---|---|
--backend, -b |
string | config | Override the default memory backend |
--chunk-size |
int | 512 |
Chunk size in tokens |
--chunk-overlap |
int | 64 |
Overlap between chunks in tokens |
The ingestion pipeline supports text, markdown, code files, and PDF (with pdfplumber installed). Binary files and hidden directories are automatically skipped.
jarvis memory search <query>¶
Search the memory store for relevant document chunks.
jarvis memory search "machine learning basics"
jarvis memory search -k 10 "neural networks"
jarvis memory search --backend faiss "embeddings"
| Option | Type | Default | Description |
|---|---|---|---|
--top-k, -k |
int | 5 |
Number of results to return |
--backend, -b |
string | config | Override the default memory backend |
Results are displayed in a table with rank, score, source file, and a content preview.
jarvis memory stats¶
Show memory store statistics including document count and database size.
| Option | Type | Default | Description |
|---|---|---|---|
--backend, -b |
string | config | Override the default memory backend |
jarvis telemetry¶
Query and manage inference telemetry data stored in SQLite.
jarvis telemetry stats¶
Show aggregated telemetry statistics including total calls, tokens, cost, and latency, broken down by model and engine.
| Option | Type | Default | Description |
|---|---|---|---|
-n, --top |
int | 10 |
Number of top models to show |
jarvis telemetry export¶
Export raw telemetry records in JSON or CSV format.
jarvis telemetry export # JSON to stdout
jarvis telemetry export --format csv # CSV to stdout
jarvis telemetry export --format json -o data.json # JSON to file
jarvis telemetry export -f csv -o metrics.csv # CSV to file
| Option | Type | Default | Description |
|---|---|---|---|
-f, --format |
choice | json |
Output format: json or csv |
-o, --output |
path | stdout | Output file path |
jarvis telemetry clear¶
Delete all telemetry records from the database.
| Option | Type | Default | Description |
|---|---|---|---|
-y, --yes |
flag | off | Skip confirmation prompt |
Warning
This permanently deletes all stored telemetry data. Use --yes to skip the confirmation prompt in automated scripts.
jarvis bench¶
Run inference benchmarks against a running engine.
jarvis bench run¶
Execute benchmarks and report results.
jarvis bench run # Run all benchmarks, 10 samples
jarvis bench run -n 20 # 20 samples per benchmark
jarvis bench run -b latency # Only the latency benchmark
jarvis bench run -b throughput -n 50 --json # Throughput, 50 samples, JSON output
jarvis bench run -o results.jsonl # Write JSONL results to file
jarvis bench run -m qwen3:8b -e ollama # Specific model and engine
| Option | Type | Default | Description |
|---|---|---|---|
-m, --model MODEL |
string | auto | Model to benchmark |
-e, --engine ENGINE |
string | auto | Engine backend |
-n, --samples N |
int | 10 |
Number of samples per benchmark |
-b, --benchmark NAME |
string | all | Specific benchmark to run |
-o, --output PATH |
path | none | Write JSONL results to file |
--json |
flag | off | Output JSON summary to stdout |
Available benchmarks:
- latency -- Measures per-call inference latency (mean, p50, p95, min, max)
- throughput -- Measures tokens-per-second throughput
jarvis channel¶
Manage messaging channels for multi-platform communication. Channels connect directly to platform APIs (Telegram, Discord, Slack, etc.) -- no gateway required.
jarvis channel list¶
List registered channel backends and their connection status.
jarvis channel send¶
Send a message to a specific channel.
| Argument | Type | Description |
|---|---|---|
TARGET |
string | Channel name to send to |
MESSAGE |
string | Message content |
jarvis channel status¶
Show connection status for configured channels.
Channel Dependencies
Each channel requires its platform-specific credentials (bot tokens, API keys) configured in the [channel.<platform>] section of your config. See Configuration for details.
jarvis serve¶
Start an OpenAI-compatible API server.
jarvis serve # Default host/port from config
jarvis serve --port 8000 # Custom port
jarvis serve --host 0.0.0.0 --port 9000 # Bind to all interfaces
jarvis serve --model qwen3:8b # Specify default model
jarvis serve --agent orchestrator # Route requests through an agent
| Option | Type | Default | Description |
|---|---|---|---|
--host HOST |
string | config | Bind address |
--port PORT |
int | config | Port number |
-e, --engine ENGINE |
string | auto | Engine backend |
-m, --model MODEL |
string | config | Default model for inference |
-a, --agent AGENT |
string | none | Agent for non-streaming requests |
Server Dependencies
The serve command requires the server extra:
This installs FastAPI, uvicorn, and related dependencies.
API Endpoints¶
The server exposes the following OpenAI-compatible endpoints:
| Method | Path | Description |
|---|---|---|
| POST | /v1/chat/completions |
Chat completions (streaming & non-streaming) |
| GET | /v1/models |
List available models |
| GET | /health |
Health check |
| GET | /v1/channels |
List available messaging channels |
| POST | /v1/channels/send |
Send a message to a channel |
| GET | /v1/channels/status |
Channel bridge connection status |
Example with curl:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3:8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
When an agent is configured (e.g., --agent orchestrator), non-streaming requests are routed through the agent with access to all registered tools. For tool-capable agents (orchestrator, react, openhands), all registered tools are automatically loaded and made available.