macOS Installation Guide¶
This guide walks through a complete OpenJarvis installation on macOS using llama.cpp as the inference engine. It covers every step from scratch — including pitfalls not documented elsewhere — and is suitable for both Apple Silicon and Intel Macs.
Prefer Ollama?
If you want the fastest possible setup, use Ollama instead. This guide is for users who want to run GGUF models directly with llama.cpp, or who want a deeper understanding of the full stack.
What You'll Install¶
| Tool | Purpose |
|---|---|
| Homebrew | macOS package manager — installs everything else |
| uv | Python version and dependency manager |
| Git | Clones the OpenJarvis repo |
| Node.js | Required for the browser UI |
| Rust | Compiles the OpenJarvis security and memory extension |
| llama.cpp | Local inference engine that runs GGUF model files |
| OpenJarvis | The framework itself |
| A GGUF model | The actual AI model (downloaded separately) |
Step-by-Step Installation¶
Step 1 — Install Homebrew¶
Homebrew is the standard macOS package manager. Everything else in this guide is installed through it.
If you already have Homebrew, skip this step.
Step 2 — Install uv¶
uv replaces pip, virtualenv, and pyenv in one tool. OpenJarvis uses it to manage Python
versions, virtual environments, and project dependencies.
Step 3 — Install Git¶
Git is used to clone the OpenJarvis source code. It may already be present if you have Xcode Command Line Tools installed.
Step 4 — Install Node.js¶
Node.js is required to build and run the browser frontend. Without it you can still use the CLI, but not the web UI.
Step 5 — Install Rust¶
OpenJarvis includes a Rust extension that provides security scanning, memory indexing, rate limiting, and tool execution. It must be compiled from source.
After the installer finishes, reload your shell so rustc is available:
Verify:
Step 6 — Install llama.cpp¶
llama.cpp is the inference engine that loads and runs GGUF model files. It is not a model
itself — think of it as a media player and the .gguf file as the content.
Step 7 — Clone the OpenJarvis repo¶
Run this from your home directory or any neutral parent folder.
Do not clone from inside an existing OpenJarvis folder
A common mistake is running git clone while already inside the repo, creating deeply
nested duplicates (OpenJarvis/OpenJarvis/OpenJarvis). Always clone from ~ or a
neutral parent directory.
Step 8 — Pin Python to 3.12¶
Critical step — do not skip
OpenJarvis requires Python 3.10–3.13. Its Rust extension uses PyO3, which does not yet
support Python 3.14. If uv has Python 3.14 available, it will use it by default,
causing the Rust extension build to fail silently and resulting in ~250 test failures
with ModuleNotFoundError: No module named 'openjarvis_rust'.
Pin the project to Python 3.12:
Restart your terminal, then verify:
Why restart the terminal?
Without restarting, the shell may still reference the old virtual environment. This is the most common reason the version pin appears not to work.
Step 9 — Install Python dependencies¶
The --extra server flag adds the FastAPI backend required for the browser UI.
Step 10 — Build the Rust extension¶
This compiles the Rust extension and installs it into the virtual environment. It provides security scanning, memory indexing, MCP tool execution, and rate limiting. This step takes a few minutes on first run.
Verify it built correctly:
Step 11 — Install frontend dependencies¶
Step 12 — Download a model¶
OpenJarvis needs a GGUF model file to run inference. First install the Hugging Face CLI, then download your chosen model.
The CLI command is hf, not huggingface-cli
When installed via uv tool, the Hugging Face CLI is invoked as hf.
Faster, lower RAM requirement. Good for most everyday tasks.
Use the Qwen_ prefix
bartowski's Qwen3 repos use the Qwen_ prefix (e.g. Qwen_Qwen3-4B-GGUF). Using
the shorter name without the prefix returns a "repository not found" error.
Apple Silicon vs Intel
On Apple Silicon, both models benefit from Metal GPU acceleration when using the MLX engine. On Intel, inference runs on CPU — the 4B model is recommended for speed.
Step 13 — Configure OpenJarvis¶
Run the init command to detect your hardware and generate a config file:
Then open the config and set the default model to match the filename you downloaded:
Find the default_model line and update it, for example:
Step 14 — Verify the installation¶
A healthy setup looks like this:
✓ Python version 3.12.x
✓ Config file ~/.openjarvis/config.toml
✓ Config parsing Config loaded successfully
✓ Engine: llamacpp Reachable
✓ Models: llamacpp Qwen_Qwen3-4B-Q4_K_M.gguf
✓ Default model Qwen_Qwen3-4B-Q4_K_M.gguf (on llamacpp)
Warnings for other engines are normal
The ! warnings for engines like ollama, vllm, and lmstudio simply mean those
backends are not running. You only need llamacpp to be reachable.
Running OpenJarvis¶
CLI¶
Start llama-server in one terminal, then run queries in another:
# Terminal 1 — start the inference engine
llama-server -m ~/models/Qwen_Qwen3-4B-Q4_K_M.gguf -c 4096 -t 8
# Terminal 2 — ask a question
cd ~/OpenJarvis
uv run jarvis ask "What is the capital of France?"
Browser UI¶
# Terminal 1 — inference engine
llama-server -m ~/models/Qwen_Qwen3-4B-Q4_K_M.gguf -c 4096 -t 8
# Terminal 2 — backend
cd ~/OpenJarvis && uv run jarvis serve --port 8000
# Terminal 3 — frontend
cd ~/OpenJarvis/frontend && npm run dev
Then open http://localhost:5173.
Skip typing uv run every time¶
Activate the virtual environment for your current terminal session:
Your prompt will show (openjarvis) when active, and you can type jarvis ask "..." directly.
Performance Tips¶
These tips apply when using llama.cpp for CPU inference.
| Flag | Effect |
|---|---|
-c 4096 |
Reduces context window from the 32,768 default, freeing RAM for faster inference |
-t 8 |
Uses all available CPU threads (default is only 4) — adjust to your machine's thread count |
Q4_K_M quantization |
Best balance of size, speed, and quality for CPU inference |
On Apple Silicon, switching to the MLX engine gives significantly better performance than llama.cpp for most models.
Common Errors¶
No such file or directory when loading model¶
The path path/to/model.gguf in examples is a placeholder. Replace it with your actual
model path, e.g.:
No module named 'openjarvis_rust'¶
The Rust extension did not build correctly, or was built against the wrong Python version.
- Confirm Python 3.12 is active:
uv run python --version - Rebuild:
uv run maturin develop -m rust/crates/openjarvis-python/Cargo.toml
If the version shows 3.14, go back to Step 8.
PyO3 version error — Python 3.14 too new¶
error: the configured Python interpreter version (3.14) is newer than
PyO3's maximum supported version (3.13)
PyO3 0.23.5 supports Python up to 3.13. Follow Step 8 to
pin to 3.12, then delete .venv, recreate it, and restart your terminal before retrying.
Repository not found when downloading model¶
bartowski's Qwen3 repos use the Qwen_ prefix. Use:
No inference engine available¶
llama-server is not running. Start it in a separate terminal before running any jarvis
commands, and wait until you see model loaded in the output.
Python version still shows 3.14 after recreating the venv¶
Close the terminal completely and reopen it. The old venv path is cached in the shell environment and persists across commands until the session ends.
zsh: command not found: huggingface-cli¶
When installed via uv tool, the CLI is invoked as hf, not huggingface-cli:
Next Steps¶
- Quick Start — Run your first query and explore agents and tools
- Configuration — Customize engine hosts, model routing, memory, and more
- Architecture — Understand how OpenJarvis is structured