Skip to content

Docker Deployment

OpenJarvis provides Docker images for both CPU-only and GPU-accelerated deployments, along with a Docker Compose configuration that bundles the API server with an Ollama inference backend.

Quick Start

The fastest way to get OpenJarvis running in Docker is with Docker Compose, which starts both the API server and an Ollama backend:

docker compose up -d

This brings up two services:

Service Port Description
jarvis 8000 OpenJarvis API server
ollama 11434 Ollama inference engine

Verify the server is running:

curl http://localhost:8000/health

Expected response:

{"status": "ok"}

Docker Images

CPU-Only Image (Dockerfile)

The default Dockerfile uses a multi-stage build based on python:3.12-slim to produce a minimal image.

Build stages:

  1. Builder stage -- installs uv and the openjarvis[server] package (which includes FastAPI, uvicorn, and all server dependencies) from the project source.
  2. Runtime stage -- copies only the installed Python packages and application code from the builder, keeping the final image small.
FROM python:3.12-slim AS builder

WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server]"

FROM python:3.12-slim

COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app

EXPOSE 8000

ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]

Build it manually:

docker build -t openjarvis:latest .

Run it standalone:

docker run -d -p 8000:8000 openjarvis:latest

GPU Image (Dockerfile.gpu)

The GPU image is built on nvidia/cuda:12.4.0-runtime-ubuntu22.04 and includes the CUDA 12.4 runtime libraries, enabling GPU-accelerated inference when paired with a GPU-capable engine like vLLM or SGLang.

FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends python3 python3-pip python3-venv && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server]"

FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04

RUN apt-get update && \
    apt-get install -y --no-install-recommends python3 python3-pip && \
    rm -rf /var/lib/apt/lists/*

COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app

EXPOSE 8000

ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]

Build the GPU image:

docker build -f Dockerfile.gpu -t openjarvis:gpu .

Run with GPU access (requires the NVIDIA Container Toolkit):

docker run -d --gpus all -p 8000:8000 openjarvis:gpu

NVIDIA Container Toolkit required

The host machine must have the NVIDIA Container Toolkit installed for --gpus to work. See the NVIDIA installation guide for setup instructions.

Docker Compose Configuration

The docker-compose.yml defines a complete deployment with the OpenJarvis API server and an Ollama backend:

version: "3.9"

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama-models:/root/.ollama
    restart: unless-stopped

volumes:
  ollama-models:

Environment Variables

The jarvis service is configured through environment variables:

Variable Description Default
OPENJARVIS_ENGINE_DEFAULT Inference engine backend to use ollama
OPENJARVIS_OLLAMA_HOST URL of the Ollama server (uses Docker service name) http://ollama:11434

Volumes

The ollama-models named volume persists downloaded models across container restarts, so models do not need to be re-pulled after a docker compose down / docker compose up cycle.

Service Dependencies

The jarvis service declares depends_on: ollama, ensuring the Ollama container starts before the API server. Both services use restart: unless-stopped to automatically recover from crashes.

Custom Configuration

Mounting a Configuration File

To use a custom config.toml, mount it into the container at the expected path (~/.openjarvis/config.toml, which is /root/.openjarvis/config.toml in the container):

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    volumes:
      - ./my-config.toml:/root/.openjarvis/config.toml:ro
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

Persisting Data

To persist telemetry data, memory databases, and trace records across container restarts, mount the entire OpenJarvis data directory:

services:
  jarvis:
    # ... other config ...
    volumes:
      - openjarvis-data:/root/.openjarvis

volumes:
  ollama-models:
  openjarvis-data:

This preserves:

  • telemetry.db -- inference call telemetry records
  • memory.db -- the default SQLite memory backend
  • traces.db -- interaction trace records
  • config.toml -- user configuration

Using the GPU Image with Compose

To use the GPU Dockerfile in your Compose setup, change the dockerfile field and add GPU resource reservations:

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile.gpu
    ports:
      - "8000:8000"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

Health Check

The API server exposes a GET /health endpoint that checks whether the underlying inference engine is responsive:

curl http://localhost:8000/health

A healthy response returns HTTP 200:

{"status": "ok"}

An unhealthy engine returns HTTP 503:

{"detail": "Engine unhealthy"}

You can integrate this into your Docker Compose healthcheck:

services:
  jarvis:
    # ... other config ...
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

Building Custom Images

Adding Extra Dependencies

To include additional engine backends (such as vLLM or ColBERT memory), modify the install command in the Dockerfile:

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server,inference-vllm,memory-colbert]"

Overriding the Default Command

The entrypoint is jarvis and the default command is serve --host 0.0.0.0 --port 8000. Override the command to change server options:

docker run -d -p 9000:9000 openjarvis:latest \
  serve --host 0.0.0.0 --port 9000 --engine ollama --model qwen3:8b

Or in Docker Compose:

services:
  jarvis:
    build: .
    command: ["serve", "--host", "0.0.0.0", "--port", "9000", "--model", "qwen3:8b"]
    ports:
      - "9000:9000"

Available CLI Options for jarvis serve

Option Description
--host Bind address (default: from config, typically 0.0.0.0)
--port Port number (default: from config, typically 8000)
-e / --engine Engine backend (ollama, vllm, llamacpp, sglang)
-m / --model Default model name
-a / --agent Agent for non-streaming requests (simple, orchestrator, react, openhands)

Pulling Models

After starting the Ollama container, you need to pull at least one model before the API server can serve requests:

docker compose exec ollama ollama pull qwen3:8b

Verify models are available through the API:

curl http://localhost:8000/v1/models