Docker Deployment¶
OpenJarvis provides Docker images for both CPU-only and GPU-accelerated deployments, along with a Docker Compose configuration that bundles the API server with an Ollama inference backend.
Quick Start¶
The fastest way to get OpenJarvis running in Docker is with Docker Compose, which starts both the API server and an Ollama backend:
This brings up two services:
| Service | Port | Description |
|---|---|---|
jarvis |
8000 | OpenJarvis API server |
ollama |
11434 | Ollama inference engine |
Verify the server is running:
Expected response:
Docker Images¶
CPU-Only Image (Dockerfile)¶
The default Dockerfile uses a multi-stage build based on python:3.12-slim to produce a minimal image.
Build stages:
- Builder stage -- installs
uvand theopenjarvis[server]package (which includes FastAPI, uvicorn, and all server dependencies) from the project source. - Runtime stage -- copies only the installed Python packages and application code from the builder, keeping the final image small.
FROM python:3.12-slim AS builder
WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/
RUN pip install --no-cache-dir uv && \
uv pip install --system ".[server]"
FROM python:3.12-slim
COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app
EXPOSE 8000
ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]
Build it manually:
Run it standalone:
GPU Image (Dockerfile.gpu)¶
The GPU image is built on nvidia/cuda:12.4.0-runtime-ubuntu22.04 and includes the CUDA 12.4 runtime libraries, enabling GPU-accelerated inference when paired with a GPU-capable engine like vLLM or SGLang.
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS builder
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 python3-pip python3-venv && \
rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/
RUN pip install --no-cache-dir uv && \
uv pip install --system ".[server]"
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
RUN apt-get update && \
apt-get install -y --no-install-recommends python3 python3-pip && \
rm -rf /var/lib/apt/lists/*
COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app
EXPOSE 8000
ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]
Build the GPU image:
Run with GPU access (requires the NVIDIA Container Toolkit):
NVIDIA Container Toolkit required
The host machine must have the NVIDIA Container Toolkit installed for --gpus to work. See the NVIDIA installation guide for setup instructions.
Docker Compose Configuration¶
The docker-compose.yml defines a complete deployment with the OpenJarvis API server and an Ollama backend:
version: "3.9"
services:
jarvis:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- OPENJARVIS_ENGINE_DEFAULT=ollama
- OPENJARVIS_OLLAMA_HOST=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
ollama:
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- ollama-models:/root/.ollama
restart: unless-stopped
volumes:
ollama-models:
Environment Variables¶
The jarvis service is configured through environment variables:
| Variable | Description | Default |
|---|---|---|
OPENJARVIS_ENGINE_DEFAULT |
Inference engine backend to use | ollama |
OPENJARVIS_OLLAMA_HOST |
URL of the Ollama server (uses Docker service name) | http://ollama:11434 |
Volumes¶
The ollama-models named volume persists downloaded models across container restarts, so models do not need to be re-pulled after a docker compose down / docker compose up cycle.
Service Dependencies¶
The jarvis service declares depends_on: ollama, ensuring the Ollama container starts before the API server. Both services use restart: unless-stopped to automatically recover from crashes.
Custom Configuration¶
Mounting a Configuration File¶
To use a custom config.toml, mount it into the container at the expected path (~/.openjarvis/config.toml, which is /root/.openjarvis/config.toml in the container):
services:
jarvis:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:8000"
volumes:
- ./my-config.toml:/root/.openjarvis/config.toml:ro
environment:
- OPENJARVIS_ENGINE_DEFAULT=ollama
- OPENJARVIS_OLLAMA_HOST=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
Persisting Data¶
To persist telemetry data, memory databases, and trace records across container restarts, mount the entire OpenJarvis data directory:
services:
jarvis:
# ... other config ...
volumes:
- openjarvis-data:/root/.openjarvis
volumes:
ollama-models:
openjarvis-data:
This preserves:
telemetry.db-- inference call telemetry recordsmemory.db-- the default SQLite memory backendtraces.db-- interaction trace recordsconfig.toml-- user configuration
Using the GPU Image with Compose¶
To use the GPU Dockerfile in your Compose setup, change the dockerfile field and add GPU resource reservations:
services:
jarvis:
build:
context: .
dockerfile: Dockerfile.gpu
ports:
- "8000:8000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- OPENJARVIS_ENGINE_DEFAULT=ollama
- OPENJARVIS_OLLAMA_HOST=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
Health Check¶
The API server exposes a GET /health endpoint that checks whether the underlying inference engine is responsive:
A healthy response returns HTTP 200:
An unhealthy engine returns HTTP 503:
You can integrate this into your Docker Compose healthcheck:
services:
jarvis:
# ... other config ...
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
Building Custom Images¶
Adding Extra Dependencies¶
To include additional engine backends (such as vLLM or ColBERT memory), modify the install command in the Dockerfile:
RUN pip install --no-cache-dir uv && \
uv pip install --system ".[server,inference-vllm,memory-colbert]"
Overriding the Default Command¶
The entrypoint is jarvis and the default command is serve --host 0.0.0.0 --port 8000. Override the command to change server options:
docker run -d -p 9000:9000 openjarvis:latest \
serve --host 0.0.0.0 --port 9000 --engine ollama --model qwen3:8b
Or in Docker Compose:
services:
jarvis:
build: .
command: ["serve", "--host", "0.0.0.0", "--port", "9000", "--model", "qwen3:8b"]
ports:
- "9000:9000"
Available CLI Options for jarvis serve¶
| Option | Description |
|---|---|
--host |
Bind address (default: from config, typically 0.0.0.0) |
--port |
Port number (default: from config, typically 8000) |
-e / --engine |
Engine backend (ollama, vllm, llamacpp, sglang) |
-m / --model |
Default model name |
-a / --agent |
Agent for non-streaming requests (simple, orchestrator, react, openhands) |
Pulling Models¶
After starting the Ollama container, you need to pull at least one model before the API server can serve requests:
Verify models are available through the API: