Skip to content

Docker Deployment

OpenJarvis provides Docker images for both CPU-only and GPU-accelerated deployments, along with a Docker Compose configuration that bundles the API server with an Ollama inference backend.

Quick Start

The container binds 0.0.0.0, so an API key is required — the server refuses to start on a non-loopback address without one. Set it first:

cd deploy/docker
cp .env.example .env
echo "OPENJARVIS_API_KEY=$(jarvis auth generate-key)" > .env   # or paste your own

Then start both the API server and an Ollama backend with Docker Compose:

docker compose up -d

docker compose reads OPENJARVIS_API_KEY from .env (or your shell environment) and fails fast if it is unset. Clients must then send Authorization: Bearer <key> on /v1/* and /api/* requests.

This brings up two services:

Service Port Description
jarvis 8000 OpenJarvis API server
ollama 11434 Ollama inference engine

Verify the server is running:

curl http://localhost:8000/health

Expected response:

{"status": "ok"}

Docker Images

CPU-Only Image (Dockerfile)

The default Dockerfile uses a multi-stage build based on python:3.12-slim to produce a minimal image.

Build stages:

  1. Builder stage -- installs uv and the openjarvis[server] package (which includes FastAPI, uvicorn, and all server dependencies) from the project source.
  2. Runtime stage -- copies only the installed Python packages and application code from the builder, keeping the final image small.
FROM python:3.12-slim AS builder

WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server]"

FROM python:3.12-slim

COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app

EXPOSE 8000

ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]

Build it manually:

docker build -t openjarvis:latest .

Run it standalone:

docker run -d -p 8000:8000 openjarvis:latest

GPU Image (Dockerfile.gpu)

The GPU image is built on nvidia/cuda:12.4.0-runtime-ubuntu22.04 and includes the CUDA 12.4 runtime libraries, enabling GPU-accelerated inference when paired with a GPU-capable engine like vLLM or SGLang.

FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends python3 python3-pip python3-venv && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server]"

FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04

RUN apt-get update && \
    apt-get install -y --no-install-recommends python3 python3-pip && \
    rm -rf /var/lib/apt/lists/*

COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app

EXPOSE 8000

ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]

Build the GPU image:

docker build -f Dockerfile.gpu -t openjarvis:gpu .

Run with GPU access (requires the NVIDIA Container Toolkit):

docker run -d --gpus all -p 8000:8000 openjarvis:gpu

NVIDIA Container Toolkit required

The host machine must have the NVIDIA Container Toolkit installed for --gpus to work. See the NVIDIA installation guide for setup instructions.

Docker Compose Configuration

The docker-compose.yml defines a complete deployment with the OpenJarvis API server and an Ollama backend:

version: "3.9"

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama-models:/root/.ollama
    restart: unless-stopped

volumes:
  ollama-models:

Environment Variables

The jarvis service is configured through environment variables:

Variable Description Default
OPENJARVIS_ENGINE_DEFAULT Inference engine backend to use ollama
OPENJARVIS_OLLAMA_HOST URL of the Ollama server (uses Docker service name) http://ollama:11434

Volumes

The ollama-models named volume persists downloaded models across container restarts, so models do not need to be re-pulled after a docker compose down / docker compose up cycle.

Service Dependencies

The jarvis service declares depends_on: ollama, ensuring the Ollama container starts before the API server. Both services use restart: unless-stopped to automatically recover from crashes.

Custom Configuration

Mounting a Configuration File

To use a custom config.toml, mount it into the container at the expected path (~/.openjarvis/config.toml, which is /root/.openjarvis/config.toml in the container):

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    volumes:
      - ./my-config.toml:/root/.openjarvis/config.toml:ro
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

Persisting Data

To persist telemetry data, memory databases, and trace records across container restarts, mount the entire OpenJarvis data directory:

services:
  jarvis:
    # ... other config ...
    volumes:
      - openjarvis-data:/root/.openjarvis

volumes:
  ollama-models:
  openjarvis-data:

This preserves:

  • telemetry.db -- inference call telemetry records
  • memory.db -- the default SQLite memory backend
  • traces.db -- interaction trace records
  • config.toml -- user configuration

Using the GPU Image with Compose

To use the GPU Dockerfile in your Compose setup, change the dockerfile field and add GPU resource reservations:

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile.gpu
    ports:
      - "8000:8000"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

Health Check

The API server exposes a GET /health endpoint that checks whether the underlying inference engine is responsive:

curl http://localhost:8000/health

A healthy response returns HTTP 200:

{"status": "ok"}

An unhealthy engine returns HTTP 503:

{"detail": "Engine unhealthy"}

You can integrate this into your Docker Compose healthcheck:

services:
  jarvis:
    # ... other config ...
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

Building Custom Images

Adding Extra Dependencies

To include additional engine backends (such as vLLM or ColBERT memory), modify the install command in the Dockerfile:

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server,inference-vllm,memory-colbert]"

Overriding the Default Command

The entrypoint is jarvis and the default command is serve --host 0.0.0.0 --port 8000. Override the command to change server options:

docker run -d -p 9000:9000 openjarvis:latest \
  serve --host 0.0.0.0 --port 9000 --engine ollama --model qwen3:8b

Or in Docker Compose:

services:
  jarvis:
    build: .
    command: ["serve", "--host", "0.0.0.0", "--port", "9000", "--model", "qwen3:8b"]
    ports:
      - "9000:9000"

Available CLI Options for jarvis serve

Option Description
--host Bind address (default: from config, typically 0.0.0.0)
--port Port number (default: from config, typically 8000)
-e / --engine Engine backend (ollama, vllm, llamacpp, sglang)
-m / --model Default model name
-a / --agent Agent for non-streaming requests (simple, orchestrator, react, openhands)

Pulling Models

After starting the Ollama container, you need to pull at least one model before the API server can serve requests:

docker compose exec ollama ollama pull qwen3:8b

Verify models are available through the API:

curl http://localhost:8000/v1/models