Docker Deployment¶
OpenJarvis provides Docker images for both CPU-only and GPU-accelerated deployments, along with a Docker Compose configuration that bundles the API server with an Ollama inference backend.
Quick Start¶
The container binds 0.0.0.0, so an API key is required — the server
refuses to start on a non-loopback address without one. Set it first:
cd deploy/docker
cp .env.example .env
echo "OPENJARVIS_API_KEY=$(jarvis auth generate-key)" > .env # or paste your own
Then start both the API server and an Ollama backend with Docker Compose:
docker compose reads OPENJARVIS_API_KEY from .env (or your shell
environment) and fails fast if it is unset. Clients must then send
Authorization: Bearer <key> on /v1/* and /api/* requests.
This brings up two services:
| Service | Port | Description |
|---|---|---|
jarvis |
8000 | OpenJarvis API server |
ollama |
11434 | Ollama inference engine |
Verify the server is running:
Expected response:
Docker Images¶
CPU-Only Image (Dockerfile)¶
The default Dockerfile uses a multi-stage build based on python:3.12-slim to produce a minimal image.
Build stages:
- Builder stage -- installs
uvand theopenjarvis[server]package (which includes FastAPI, uvicorn, and all server dependencies) from the project source. - Runtime stage -- copies only the installed Python packages and application code from the builder, keeping the final image small.
FROM python:3.12-slim AS builder
WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/
RUN pip install --no-cache-dir uv && \
uv pip install --system ".[server]"
FROM python:3.12-slim
COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app
EXPOSE 8000
ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]
Build it manually:
Run it standalone:
GPU Image (Dockerfile.gpu)¶
The GPU image is built on nvidia/cuda:12.4.0-runtime-ubuntu22.04 and includes the CUDA 12.4 runtime libraries, enabling GPU-accelerated inference when paired with a GPU-capable engine like vLLM or SGLang.
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS builder
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 python3-pip python3-venv && \
rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/
RUN pip install --no-cache-dir uv && \
uv pip install --system ".[server]"
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 python3-pip && \
rm -rf /var/lib/apt/lists/*
COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app
EXPOSE 8000
ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]
Build the GPU image:
Run with GPU access (requires the NVIDIA Container Toolkit):
NVIDIA Container Toolkit required
The host machine must have the NVIDIA Container Toolkit installed for --gpus to work. See the NVIDIA installation guide for setup instructions.
Docker Compose Configuration¶
The docker-compose.yml defines a complete deployment with the OpenJarvis API server and an Ollama backend:
version: "3.9"
services:
jarvis:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- OPENJARVIS_ENGINE_DEFAULT=ollama
- OPENJARVIS_OLLAMA_HOST=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
ollama:
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- ollama-models:/root/.ollama
restart: unless-stopped
volumes:
ollama-models:
Environment Variables¶
The jarvis service is configured through environment variables:
| Variable | Description | Default |
|---|---|---|
OPENJARVIS_ENGINE_DEFAULT |
Inference engine backend to use | ollama |
OPENJARVIS_OLLAMA_HOST |
URL of the Ollama server (uses Docker service name) | http://ollama:11434 |
Volumes¶
The ollama-models named volume persists downloaded models across container restarts, so models do not need to be re-pulled after a docker compose down / docker compose up cycle.
Service Dependencies¶
The jarvis service declares depends_on: ollama, ensuring the Ollama container starts before the API server. Both services use restart: unless-stopped to automatically recover from crashes.
Custom Configuration¶
Mounting a Configuration File¶
To use a custom config.toml, mount it into the container at the expected path (~/.openjarvis/config.toml, which is /root/.openjarvis/config.toml in the container):
services:
jarvis:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:8000"
volumes:
- ./my-config.toml:/root/.openjarvis/config.toml:ro
environment:
- OPENJARVIS_ENGINE_DEFAULT=ollama
- OPENJARVIS_OLLAMA_HOST=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
Persisting Data¶
To persist telemetry data, memory databases, and trace records across container restarts, mount the entire OpenJarvis data directory:
services:
jarvis:
# ... other config ...
volumes:
- openjarvis-data:/root/.openjarvis
volumes:
ollama-models:
openjarvis-data:
This preserves:
telemetry.db-- inference call telemetry recordsmemory.db-- the default SQLite memory backendtraces.db-- interaction trace recordsconfig.toml-- user configuration
Using the GPU Image with Compose¶
To use the GPU Dockerfile in your Compose setup, change the dockerfile field and add GPU resource reservations:
services:
jarvis:
build:
context: .
dockerfile: Dockerfile.gpu
ports:
- "8000:8000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- OPENJARVIS_ENGINE_DEFAULT=ollama
- OPENJARVIS_OLLAMA_HOST=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
Health Check¶
The API server exposes a GET /health endpoint that checks whether the underlying inference engine is responsive:
A healthy response returns HTTP 200:
An unhealthy engine returns HTTP 503:
You can integrate this into your Docker Compose healthcheck:
services:
jarvis:
# ... other config ...
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
Building Custom Images¶
Adding Extra Dependencies¶
To include additional engine backends (such as vLLM or ColBERT memory), modify the install command in the Dockerfile:
RUN pip install --no-cache-dir uv && \
uv pip install --system ".[server,inference-vllm,memory-colbert]"
Overriding the Default Command¶
The entrypoint is jarvis and the default command is serve --host 0.0.0.0 --port 8000. Override the command to change server options:
docker run -d -p 9000:9000 openjarvis:latest \
serve --host 0.0.0.0 --port 9000 --engine ollama --model qwen3:8b
Or in Docker Compose:
services:
jarvis:
build: .
command: ["serve", "--host", "0.0.0.0", "--port", "9000", "--model", "qwen3:8b"]
ports:
- "9000:9000"
Available CLI Options for jarvis serve¶
| Option | Description |
|---|---|
--host |
Bind address (default: from config, typically 0.0.0.0) |
--port |
Port number (default: from config, typically 8000) |
-e / --engine |
Engine backend (ollama, vllm, llamacpp, sglang) |
-m / --model |
Default model name |
-a / --agent |
Agent for non-streaming requests (simple, orchestrator, react, openhands) |
Pulling Models¶
After starting the Ollama container, you need to pull at least one model before the API server can serve requests:
Verify models are available through the API: