Personal AI, On Personal Devices¶
OpenJarvis is a research framework for composable, on-device AI systems. Build personal AI that runs on your hardware. Cloud APIs are optional.
Why OpenJarvis?¶
Personal AI agents are exploding in popularity, but nearly all of them still route intelligence through cloud APIs. Your "personal" AI continues to depend on someone else's server. At the same time, our Intelligence Per Watt research showed that local language models already handle 88.7% of single-turn chat and reasoning queries, with intelligence efficiency improving 5.3× from 2023 to 2025. The models and hardware are increasingly ready. What has been missing is the software stack to make local-first personal AI practical.
OpenJarvis is that stack. It is an opinionated framework for local-first personal AI, built around three core ideas: shared primitives for building on-device agents; evaluations that treat energy, FLOPs, latency, and dollar cost as first-class constraints alongside accuracy; and a learning loop that improves models using local trace data. The goal is simple: make it possible to build personal AI agents that run locally by default, calling the cloud only when truly necessary. OpenJarvis aims to be both a research platform and a production foundation for local AI, in the spirit of PyTorch.
Get Started¶
Run the full chat UI locally with one script:
This installs dependencies, starts Ollama + a local model, launches the backend
and frontend, and opens http://localhost:5173 in your browser.
The desktop app is a native window for the OpenJarvis UI. The backend (Ollama + inference) runs on your machine — start it first, then open the app.
Step 1. Start the backend:
Step 2. Download and open the desktop app:
Also available for Windows, Linux (DEB), and Linux (RPM). See the Downloads page for details.
The app connects to http://localhost:8000 automatically.
macOS: run xattr -cr /Applications/OpenJarvis.app if the app shows as \"damaged\".
Five Primitives¶
- Intelligence — The LM: model catalog, generation defaults, quantization, preferred engine.
- Agents — The agentic harness: system prompt, tools, context, retry and exit logic. Seven agent types.
- Tools — MCP interface: web search, calculator, file I/O, code interpreter, retrieval, and any external MCP server.
- Engine — The inference runtime: Ollama, vLLM, SGLang, llama.cpp, cloud APIs. Same
InferenceEngineABC. - Learning — Improvement loop: SFT weight updates, agent advisor, ICL updater. Trace-driven feedback.
Key Features¶
-
Five Composable Primitives
Intelligence, Agents, Tools, Engine, and Learning — each with a clear ABC interface and decorator-based registry.
-
5 Engine Backends
Ollama, vLLM, SGLang, llama.cpp, and cloud (OpenAI/Anthropic/Google). Same
InferenceEngineABC. -
Hardware-Aware
Auto-detects GPU vendor, model, and VRAM. Recommends the optimal engine for your hardware.
-
Offline-First
All core functionality works without a network connection. Cloud APIs are optional extras.
-
OpenAI-Compatible API
jarvis servestarts a FastAPI server with SSE streaming. Drop-in replacement for OpenAI clients. -
Trace-Driven Learning
Every interaction is traced. The learning system improves models (SFT) and agents (prompt, tools, logic).
Documentation¶
-
Install OpenJarvis, configure your first engine, and run your first query.
-
CLI, Python SDK, agents, memory, tools, telemetry, and benchmarks.
-
Five-primitive design, registry pattern, query flow, and cross-cutting learning.
-
Auto-generated reference for every module.
-
Docker, systemd, launchd. GPU-accelerated container images.
-
Contributing guide, extension patterns, roadmap, and changelog.
Sponsors¶
Laude Institute • Stanford Marlowe • Google Cloud Platform • Lambda Labs • Ollama • IBM Research • Stanford HAI