Changelog¶

All notable changes to OpenJarvis are documented in this file.

Unreleased — Phase 11 (NanoClaw Subsumption)¶

27 new files, ~3,565 lines, 147+ new tests. Full suite: 2059+ tests pass.

Added¶

ClaudeCodeAgent (agents/claude_code.py) -- Wraps the @anthropic-ai/claude-code SDK via a bundled Node.js subprocess bridge. Communicates over stdin/stdout using sentinel-delimited JSON (---OPENJARVIS_OUTPUT_START--- / ---OPENJARVIS_OUTPUT_END---). The bundled runner is auto-installed to ~/.openjarvis/claude_code_runner/ via npm install --production on first use. Registered as "claude_code" with accepts_tools = False. Requires Node.js 22+ and ANTHROPIC_API_KEY.
WhatsAppBaileysChannel (channels/whatsapp_baileys.py) -- Bidirectional WhatsApp messaging using the Baileys protocol. Spawns a Node.js bridge subprocess (whatsapp_baileys_bridge/) for QR-code authentication, incoming message forwarding, and outbound delivery via JID addressing. Registered as "whatsapp_baileys" in ChannelRegistry. Authentication state is persisted to ~/.openjarvis/whatsapp_baileys_bridge/auth/. New config section: [channel.whatsapp_baileys].
ContainerRunner (sandbox/runner.py) -- Manages Docker (or Podman) container lifecycle for sandboxed agent execution. Builds docker run --rm --network none -i commands with allowlist-validated read-only bind mounts. Supports configurable image, timeout, concurrent container limit, and runtime binary. Uses the same sentinel-delimited JSON protocol as ClaudeCodeAgent.
SandboxedAgent (sandbox/runner.py) -- Transparent wrapper that runs any BaseAgent inside a container via ContainerRunner. Follows the GuardrailsEngine wrapper pattern. accepts_tools = False.
MountAllowlist / validate_mounts() (sandbox/mount_security.py) -- Port of NanoClaw's mount-security.ts. Validates bind mounts against a JSON allowlist (allowed root directories + blocked filename patterns). Raises ValueError for blocked or out-of-root paths before the container starts. Default blocked patterns include .ssh, .env, *.pem, *.key, credential files, and cloud config directories.
TaskScheduler (scheduler/scheduler.py) -- Background polling scheduler supporting three schedule types: cron (via croniter or built-in fallback), interval (seconds), and once (ISO 8601 datetime). Runs a daemon thread (jarvis-scheduler) polling SQLite every 60 seconds (configurable). Executes due tasks via JarvisSystem.ask() with optional agent and tool selection. Publishes scheduler_task_start / scheduler_task_end events on the EventBus. New config section: [scheduler].
SchedulerStore (scheduler/store.py) -- SQLite CRUD backend for scheduled tasks and run logs. Two tables: scheduled_tasks (task state) and task_run_logs (execution history). Supports task filtering by status and due-time polling via get_due_tasks().
Scheduler MCP tools (scheduler/tools.py) -- Five new MCP-discoverable tools registered in ToolRegistry:
- schedule_task -- Create a new scheduled task
- list_scheduled_tasks -- List tasks filtered by status
- pause_scheduled_task -- Pause an active task
- resume_scheduled_task -- Resume a paused task (recomputes next_run)
- cancel_scheduled_task -- Permanently cancel a task
Scheduler CLI commands -- jarvis scheduler subcommand group:
- jarvis scheduler create -- Create a new scheduled task
- jarvis scheduler list -- List all or filtered tasks
- jarvis scheduler pause <id> -- Pause a task
- jarvis scheduler resume <id> -- Resume a task
- jarvis scheduler cancel <id> -- Cancel a task
- jarvis scheduler logs <id> -- Show run history for a task
- jarvis scheduler start -- Start the background scheduler daemon

Changed¶

ChannelRegistry now includes WhatsAppBaileysChannel.
AgentRegistry now includes ClaudeCodeAgent ("claude_code").
Architecture overview and source directory layout updated to reflect new sandbox/ and scheduler/ modules.

Unreleased — Phase 10 Tooling Updates¶

Added¶

build_tool_descriptions() shared builder -- Single source of truth for generating enriched tool descriptions in agent system prompts. Produces Markdown sections with name, description, category, and parameter schemas.
Enriched agent prompts -- NativeReActAgent, NativeOpenHandsAgent, RLMAgent, and OrchestratorAgent (structured mode) now inject detailed tool descriptions into their system prompts via the shared builder.
Case-insensitive parsing -- ReAct (Action: / Final Answer:) and Orchestrator structured-mode parsing (TOOL: / FINAL_ANSWER:) are now case-insensitive.
Multi-provider tool_calls extraction -- CloudEngine now extracts tool_calls from Anthropic (tool_use content blocks) and Google (function_call parts), normalizing to the flat {id, name, arguments} format. LiteLLM engine handles the flat-format tool calls returned by the LiteLLM proxy.
RLM tool awareness -- RLMAgent injects an ## Available Tools section into its system prompt when tools are provided.
Orchestrator structured tool descriptions -- Structured mode passes tools=self._tools to build_system_prompt() for enriched descriptions.
Telemetry modules -- EfficiencyMetrics, GPUMonitor, VLLMMetrics for energy, GPU utilization, and vLLM server-side metrics collection.
Eval TOML config -- TOML-based eval suite configuration system for defining models x benchmarks matrices.

Changed¶

Agent prompt generation now uses build_tool_descriptions() instead of inline tool name listing.
build_system_prompt() in prompt_registry.py accepts an optional tools parameter for enriched descriptions from BaseTool instances.
ReAct and OpenHands regex patterns updated for case-insensitive matching.

Fixed¶

Engine tool_calls normalization -- Anthropic tool_use blocks and Google function_call parts are now correctly extracted and converted to the standard flat format used by agents.

v0.1.0¶

Phase 5 -- SDK, Production Readiness, and Documentation

Added¶

Python SDK -- Jarvis class providing a high-level sync API for programmatic use
- ask() / ask_full() methods for direct engine and agent mode queries
- MemoryHandle proxy for lazy memory backend initialization
- list_models() and list_engines() for runtime introspection
- Router policy selection via config (learning.default_policy)
- Lazy engine initialization with automatic discovery and health probing
- Resource cleanup via close()
Benchmarking framework
- BaseBenchmark ABC and BenchmarkSuite runner
- LatencyBenchmark measuring per-call latency (mean, p50, p95, min, max)
- ThroughputBenchmark measuring tokens-per-second throughput
- BenchmarkResult dataclass with JSONL export
- jarvis bench run CLI with options for model, engine, sample count, benchmark selection, and JSON/JSONL output
Docker deployment
- Dockerfile -- Multi-stage Python 3.12-slim build with [server] extra
- Dockerfile.gpu -- NVIDIA CUDA 12.4 runtime variant
- docker-compose.yml -- Services for jarvis (port 8000) and ollama (port 11434)
- deploy/systemd/openjarvis.service -- systemd unit file for Linux
- deploy/launchd/com.openjarvis.plist -- launchd plist for macOS
Documentation site -- MkDocs Material with mkdocstrings, covering getting started, user guide, architecture, API reference, deployment, and development

v0.5.0¶

Phase 4 -- Learning, Telemetry, and Router Policies

Added¶

Learning system
- RouterPolicy ABC and RoutingContext dataclass
- RewardFunction ABC for scoring inference results
- HeuristicRewardFunction scoring on latency, cost, and efficiency
- RouterPolicyRegistry for pluggable routing strategies
- HeuristicRouter registered as "heuristic" policy (6 priority rules: code detection, math detection, short/long queries, urgency override, default fallback)
- TraceDrivenPolicy registered as "learned" policy with batch updates via update_from_traces() and online updates via observe()
- GRPORouterPolicy stub registered as "grpo" for future RL training
- ensure_registered() pattern for lazy, test-safe registration
Telemetry aggregation
- TelemetryAggregator with per_model_stats(), per_engine_stats(), top_models(), summary(), export_records(), and clear() methods
- Time-range filtering via since / until parameters
- ModelStats and EngineStats dataclasses
- AggregatedStats summary dataclass
CLI enhancements
- --router flag on jarvis ask for explicit policy selection
- jarvis telemetry stats -- display aggregated telemetry statistics
- jarvis telemetry export --format json|csv -- export telemetry records
- jarvis telemetry clear --yes -- delete all telemetry records

v0.4.0¶

Phase 3 -- Agents, Tools, and API Server

Added¶

Agent system
- BaseAgent ABC with run() method returning AgentResult
- AgentContext dataclass with conversation, tools, and memory results
- AgentResult dataclass with content, tool results, turns, and metadata
- AgentRegistry for pluggable agent implementations
- SimpleAgent -- single-turn query-to-response, no tool calling
- OrchestratorAgent -- multi-turn tool-calling loop with ToolExecutor, configurable max_turns
- CustomAgent -- template for user-defined agent behavior
Tool system
- BaseTool ABC with spec property and execute() method
- ToolSpec dataclass describing tool interface and characteristics
- ToolExecutor dispatch engine with JSON argument parsing, latency tracking, and event bus integration (TOOL_CALL_START / TOOL_CALL_END)
- ToolRegistry for tool discovery
- to_openai_function() method for OpenAI function calling format
- Built-in tools:
  - CalculatorTool -- safe math evaluation via AST parsing
  - ThinkTool -- reasoning scratchpad for chain-of-thought
  - RetrievalTool -- memory search integration
  - LLMTool -- sub-model calls within agent loops
  - FileReadTool -- safe file reading with path validation
OpenAI-compatible API server (jarvis serve)
- FastAPI + Uvicorn with optional [server] extra
- POST /v1/chat/completions -- non-streaming and SSE streaming
- GET /v1/models -- list available models
- GET /health -- health check endpoint
- Pydantic request/response models matching OpenAI API format

v0.3.0¶

Phase 2 -- Memory System

Added¶

Memory backends
- MemoryBackend ABC with store(), retrieve(), delete(), clear()
- RetrievalResult dataclass with content, score, source, and metadata
- MemoryRegistry for backend discovery
- SQLiteMemory -- zero-dependency default using SQLite FTS5 with BM25 ranking and FTS5 query escaping
- FAISSMemory -- vector search using FAISS with sentence-transformers embeddings (optional [memory-faiss] extra)
- ColBERTMemory -- ColBERTv2 neural retrieval backend (optional [memory-colbert] extra)
- BM25Memory -- BM25 ranking backend using rank-bm25 (optional [memory-bm25] extra)
- HybridMemory -- Reciprocal Rank Fusion combining multiple backends
Document processing
- ChunkConfig dataclass for chunk size and overlap settings
- chunk_text() for splitting documents into overlapping chunks
- ingest_path() for recursively indexing files and directories
- read_document() with support for plain text, Markdown, and PDF (optional [memory-pdf] extra)
Context injection
- ContextConfig with top-k, minimum score, and max context token settings
- inject_context() for prepending memory results as system messages with source attribution
- --no-context flag on jarvis ask to disable injection
CLI commands
- jarvis memory index <path> -- index documents into memory
- jarvis memory search <query> -- search memory for relevant chunks
- jarvis memory stats -- show backend statistics
Event bus integration -- MEMORY_STORE and MEMORY_RETRIEVE events

v0.2.0¶

Phase 1 -- Intelligence and Inference

Added¶

Intelligence primitive
- ModelSpec dataclass with parameter count, context length, quantization, VRAM requirements, and supported engines
- ModelRegistry for model metadata storage
- BUILTIN_MODELS catalog with pre-defined model specifications
- register_builtin_models() and merge_discovered_models() helpers
- HeuristicRouter with rule-based model selection
- build_routing_context() for query analysis (code detection, math detection, length classification)
Inference engines
- InferenceEngine ABC with generate(), stream(), list_models(), and health() methods
- EngineRegistry for engine discovery
- OllamaEngine -- Ollama backend via native HTTP API with tool call extraction
- VllmEngine -- vLLM backend via OpenAI-compatible API
- LlamaCppEngine -- llama.cpp server backend
- EngineConnectionError for unreachable engines
- messages_to_dicts() for Message-to-OpenAI-format conversion
Engine discovery
- discover_engines() -- probe all registered engines for health
- discover_models() -- aggregate model lists across engines
- get_engine() -- get configured default with automatic fallback
Hardware detection
- NVIDIA GPU detection via nvidia-smi
- AMD GPU detection via rocm-smi
- Apple Silicon detection via system_profiler
- CPU brand detection via /proc/cpuinfo and sysctl
- recommend_engine() mapping hardware to best engine
Telemetry
- TelemetryRecord dataclass with timing, tokens, energy, and cost
- TelemetryStore with SQLite persistence and EventBus subscription
- instrumented_generate() wrapper for automatic telemetry recording
CLI
- jarvis ask <query> -- query via discovered engine
- jarvis ask --agent simple <query> -- route through SimpleAgent
- jarvis model list -- list models from running engines
- jarvis model info <model> -- show model details

v0.1.0¶

Phase 0 -- Project Scaffolding

Added¶

Project structure -- hatchling build backend, uv package manager, pyproject.toml with extras for optional backends
Registry system -- RegistryBase[T] generic base class with class-specific entry isolation, register() decorator, get(), create(), items(), keys(), contains(), clear() methods
Typed registries -- ModelRegistry, EngineRegistry, MemoryRegistry, AgentRegistry, ToolRegistry, RouterPolicyRegistry, BenchmarkRegistry
Core types -- Role enum, Message, Conversation (with sliding window), ModelSpec, Quantization enum, ToolCall, ToolResult, TelemetryRecord, StepType enum, TraceStep, Trace
Configuration -- JarvisConfig dataclass hierarchy, TOML loader with overlay semantics, hardware auto-detection, generate_default_toml() for jarvis init
Event bus -- Synchronous pub/sub EventBus with EventType enum for inter-primitive communication
CLI skeleton -- Click-based jarvis command group with --version, --help, and init subcommand