Docker Deployment¶

OpenJarvis provides Docker images for both CPU-only and GPU-accelerated deployments, along with a Docker Compose configuration that bundles the API server with an Ollama inference backend.

Quick Start¶

The fastest way to get OpenJarvis running in Docker is with Docker Compose, which starts both the API server and an Ollama backend:

docker compose up -d

This brings up two services:

Service	Port	Description
`jarvis`	8000	OpenJarvis API server
`ollama`	11434	Ollama inference engine

Verify the server is running:

curl http://localhost:8000/health

Expected response:

{"status": "ok"}

Docker Images¶

CPU-Only Image (`Dockerfile`)¶

The default Dockerfile uses a multi-stage build based on python:3.12-slim to produce a minimal image.

Build stages:

Builder stage -- installs uv and the openjarvis[server] package (which includes FastAPI, uvicorn, and all server dependencies) from the project source.
Runtime stage -- copies only the installed Python packages and application code from the builder, keeping the final image small.

FROM python:3.12-slim AS builder

WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server]"

FROM python:3.12-slim

COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app

EXPOSE 8000

ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]

Build it manually:

docker build -t openjarvis:latest .

Run it standalone:

docker run -d -p 8000:8000 openjarvis:latest

GPU Image (`Dockerfile.gpu`)¶

The GPU image is built on nvidia/cuda:12.4.0-runtime-ubuntu22.04 and includes the CUDA 12.4 runtime libraries, enabling GPU-accelerated inference when paired with a GPU-capable engine like vLLM or SGLang.

FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04 AS builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends python3 python3-pip python3-venv && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server]"

FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04

RUN apt-get update && \
    apt-get install -y --no-install-recommends python3 python3-pip && \
    rm -rf /var/lib/apt/lists/*

COPY --from=builder /usr/local /usr/local
COPY --from=builder /app /app
WORKDIR /app

EXPOSE 8000

ENTRYPOINT ["jarvis"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]

Build the GPU image:

docker build -f Dockerfile.gpu -t openjarvis:gpu .

Run with GPU access (requires the NVIDIA Container Toolkit):

docker run -d --gpus all -p 8000:8000 openjarvis:gpu

NVIDIA Container Toolkit required

The host machine must have the NVIDIA Container Toolkit installed for --gpus to work. See the NVIDIA installation guide for setup instructions.

Docker Compose Configuration¶

The docker-compose.yml defines a complete deployment with the OpenJarvis API server and an Ollama backend:

version: "3.9"

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama-models:/root/.ollama
    restart: unless-stopped

volumes:
  ollama-models:

Environment Variables¶

The jarvis service is configured through environment variables:

Variable	Description	Default
`OPENJARVIS_ENGINE_DEFAULT`	Inference engine backend to use	`ollama`
`OPENJARVIS_OLLAMA_HOST`	URL of the Ollama server (uses Docker service name)	`http://ollama:11434`

Volumes¶

The ollama-models named volume persists downloaded models across container restarts, so models do not need to be re-pulled after a docker compose down / docker compose up cycle.

Service Dependencies¶

The jarvis service declares depends_on: ollama, ensuring the Ollama container starts before the API server. Both services use restart: unless-stopped to automatically recover from crashes.

Custom Configuration¶

Mounting a Configuration File¶

To use a custom config.toml, mount it into the container at the expected path (~/.openjarvis/config.toml, which is /root/.openjarvis/config.toml in the container):

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    volumes:
      - ./my-config.toml:/root/.openjarvis/config.toml:ro
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

Persisting Data¶

To persist telemetry data, memory databases, and trace records across container restarts, mount the entire OpenJarvis data directory:

services:
  jarvis:
    # ... other config ...
    volumes:
      - openjarvis-data:/root/.openjarvis

volumes:
  ollama-models:
  openjarvis-data:

This preserves:

telemetry.db -- inference call telemetry records
memory.db -- the default SQLite memory backend
traces.db -- interaction trace records
config.toml -- user configuration

Using the GPU Image with Compose¶

To use the GPU Dockerfile in your Compose setup, change the dockerfile field and add GPU resource reservations:

services:
  jarvis:
    build:
      context: .
      dockerfile: Dockerfile.gpu
    ports:
      - "8000:8000"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OPENJARVIS_ENGINE_DEFAULT=ollama
      - OPENJARVIS_OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama
    restart: unless-stopped

Health Check¶

The API server exposes a GET /health endpoint that checks whether the underlying inference engine is responsive:

curl http://localhost:8000/health

A healthy response returns HTTP 200:

{"status": "ok"}

An unhealthy engine returns HTTP 503:

{"detail": "Engine unhealthy"}

You can integrate this into your Docker Compose healthcheck:

services:
  jarvis:
    # ... other config ...
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

Building Custom Images¶

Adding Extra Dependencies¶

To include additional engine backends (such as vLLM or ColBERT memory), modify the install command in the Dockerfile:

RUN pip install --no-cache-dir uv && \
    uv pip install --system ".[server,inference-vllm,memory-colbert]"

Overriding the Default Command¶

The entrypoint is jarvis and the default command is serve --host 0.0.0.0 --port 8000. Override the command to change server options:

docker run -d -p 9000:9000 openjarvis:latest \
  serve --host 0.0.0.0 --port 9000 --engine ollama --model qwen3:8b

Or in Docker Compose:

services:
  jarvis:
    build: .
    command: ["serve", "--host", "0.0.0.0", "--port", "9000", "--model", "qwen3:8b"]
    ports:
      - "9000:9000"

Available CLI Options for `jarvis serve`¶

Option	Description
`--host`	Bind address (default: from config, typically `0.0.0.0`)
`--port`	Port number (default: from config, typically `8000`)
`-e` / `--engine`	Engine backend (`ollama`, `vllm`, `llamacpp`, `sglang`)
`-m` / `--model`	Default model name
`-a` / `--agent`	Agent for non-streaming requests (`simple`, `orchestrator`, `react`, `openhands`)

Pulling Models¶

After starting the Ollama container, you need to pull at least one model before the API server can serve requests:

docker compose exec ollama ollama pull qwen3:8b

Verify models are available through the API:

curl http://localhost:8000/v1/models