Changelog¶
All notable changes to OpenJarvis are documented in this file.
Unreleased — Phase 11 (NanoClaw Subsumption)¶
27 new files, ~3,565 lines, 147+ new tests. Full suite: 2059+ tests pass.
Added¶
ClaudeCodeAgent(agents/claude_code.py) -- Wraps the@anthropic-ai/claude-codeSDK via a bundled Node.js subprocess bridge. Communicates over stdin/stdout using sentinel-delimited JSON (---OPENJARVIS_OUTPUT_START---/---OPENJARVIS_OUTPUT_END---). The bundled runner is auto-installed to~/.openjarvis/claude_code_runner/vianpm install --productionon first use. Registered as"claude_code"withaccepts_tools = False. Requires Node.js 22+ andANTHROPIC_API_KEY.WhatsAppBaileysChannel(channels/whatsapp_baileys.py) -- Bidirectional WhatsApp messaging using the Baileys protocol. Spawns a Node.js bridge subprocess (whatsapp_baileys_bridge/) for QR-code authentication, incoming message forwarding, and outbound delivery via JID addressing. Registered as"whatsapp_baileys"inChannelRegistry. Authentication state is persisted to~/.openjarvis/whatsapp_baileys_bridge/auth/. New config section:[channel.whatsapp_baileys].ContainerRunner(sandbox/runner.py) -- Manages Docker (or Podman) container lifecycle for sandboxed agent execution. Buildsdocker run --rm --network none -icommands with allowlist-validated read-only bind mounts. Supports configurable image, timeout, concurrent container limit, and runtime binary. Uses the same sentinel-delimited JSON protocol asClaudeCodeAgent.SandboxedAgent(sandbox/runner.py) -- Transparent wrapper that runs anyBaseAgentinside a container viaContainerRunner. Follows theGuardrailsEnginewrapper pattern.accepts_tools = False.MountAllowlist/validate_mounts()(sandbox/mount_security.py) -- Port of NanoClaw'smount-security.ts. Validates bind mounts against a JSON allowlist (allowed root directories + blocked filename patterns). RaisesValueErrorfor blocked or out-of-root paths before the container starts. Default blocked patterns include.ssh,.env,*.pem,*.key, credential files, and cloud config directories.TaskScheduler(scheduler/scheduler.py) -- Background polling scheduler supporting three schedule types:cron(viacroniteror built-in fallback),interval(seconds), andonce(ISO 8601 datetime). Runs a daemon thread (jarvis-scheduler) polling SQLite every 60 seconds (configurable). Executes due tasks viaJarvisSystem.ask()with optional agent and tool selection. Publishesscheduler_task_start/scheduler_task_endevents on theEventBus. New config section:[scheduler].SchedulerStore(scheduler/store.py) -- SQLite CRUD backend for scheduled tasks and run logs. Two tables:scheduled_tasks(task state) andtask_run_logs(execution history). Supports task filtering by status and due-time polling viaget_due_tasks().- Scheduler MCP tools (
scheduler/tools.py) -- Five new MCP-discoverable tools registered inToolRegistry:schedule_task-- Create a new scheduled tasklist_scheduled_tasks-- List tasks filtered by statuspause_scheduled_task-- Pause an active taskresume_scheduled_task-- Resume a paused task (recomputesnext_run)cancel_scheduled_task-- Permanently cancel a task
- Scheduler CLI commands --
jarvis schedulersubcommand group:jarvis scheduler create-- Create a new scheduled taskjarvis scheduler list-- List all or filtered tasksjarvis scheduler pause <id>-- Pause a taskjarvis scheduler resume <id>-- Resume a taskjarvis scheduler cancel <id>-- Cancel a taskjarvis scheduler logs <id>-- Show run history for a taskjarvis scheduler start-- Start the background scheduler daemon
Changed¶
ChannelRegistrynow includesWhatsAppBaileysChannel.AgentRegistrynow includesClaudeCodeAgent("claude_code").- Architecture overview and source directory layout updated to reflect new
sandbox/andscheduler/modules.
Unreleased — Phase 10 Tooling Updates¶
Added¶
build_tool_descriptions()shared builder -- Single source of truth for generating enriched tool descriptions in agent system prompts. Produces Markdown sections with name, description, category, and parameter schemas.- Enriched agent prompts --
NativeReActAgent,NativeOpenHandsAgent,RLMAgent, andOrchestratorAgent(structured mode) now inject detailed tool descriptions into their system prompts via the shared builder. - Case-insensitive parsing -- ReAct (
Action:/Final Answer:) and Orchestrator structured-mode parsing (TOOL:/FINAL_ANSWER:) are now case-insensitive. - Multi-provider tool_calls extraction --
CloudEnginenow extractstool_callsfrom Anthropic (tool_usecontent blocks) and Google (function_callparts), normalizing to the flat{id, name, arguments}format.LiteLLMengine handles the flat-format tool calls returned by the LiteLLM proxy. - RLM tool awareness --
RLMAgentinjects an## Available Toolssection into its system prompt when tools are provided. - Orchestrator structured tool descriptions -- Structured mode passes
tools=self._toolstobuild_system_prompt()for enriched descriptions. - Telemetry modules --
EfficiencyMetrics,GPUMonitor,VLLMMetricsfor energy, GPU utilization, and vLLM server-side metrics collection. - Eval TOML config -- TOML-based eval suite configuration system for defining models x benchmarks matrices.
Changed¶
- Agent prompt generation now uses
build_tool_descriptions()instead of inline tool name listing. build_system_prompt()inprompt_registry.pyaccepts an optionaltoolsparameter for enriched descriptions fromBaseToolinstances.- ReAct and OpenHands regex patterns updated for case-insensitive matching.
Fixed¶
- Engine
tool_callsnormalization -- Anthropictool_useblocks and Googlefunction_callparts are now correctly extracted and converted to the standard flat format used by agents.
v0.1.0¶
Phase 5 -- SDK, Production Readiness, and Documentation
Added¶
- Python SDK --
Jarvisclass providing a high-level sync API for programmatic useask()/ask_full()methods for direct engine and agent mode queriesMemoryHandleproxy for lazy memory backend initializationlist_models()andlist_engines()for runtime introspection- Router policy selection via config (
learning.default_policy) - Lazy engine initialization with automatic discovery and health probing
- Resource cleanup via
close()
- Benchmarking framework
BaseBenchmarkABC andBenchmarkSuiterunnerLatencyBenchmarkmeasuring per-call latency (mean, p50, p95, min, max)ThroughputBenchmarkmeasuring tokens-per-second throughputBenchmarkResultdataclass with JSONL exportjarvis bench runCLI with options for model, engine, sample count, benchmark selection, and JSON/JSONL output
- Docker deployment
Dockerfile-- Multi-stage Python 3.12-slim build with[server]extraDockerfile.gpu-- NVIDIA CUDA 12.4 runtime variantdocker-compose.yml-- Services forjarvis(port 8000) andollama(port 11434)deploy/systemd/openjarvis.service-- systemd unit file for Linuxdeploy/launchd/com.openjarvis.plist-- launchd plist for macOS
- Documentation site -- MkDocs Material with mkdocstrings, covering getting started, user guide, architecture, API reference, deployment, and development
v0.5.0¶
Phase 4 -- Learning, Telemetry, and Router Policies
Added¶
- Learning system
RouterPolicyABC andRoutingContextdataclassRewardFunctionABC for scoring inference resultsHeuristicRewardFunctionscoring on latency, cost, and efficiencyRouterPolicyRegistryfor pluggable routing strategiesHeuristicRouterregistered as"heuristic"policy (6 priority rules: code detection, math detection, short/long queries, urgency override, default fallback)TraceDrivenPolicyregistered as"learned"policy with batch updates viaupdate_from_traces()and online updates viaobserve()GRPORouterPolicystub registered as"grpo"for future RL trainingensure_registered()pattern for lazy, test-safe registration
- Telemetry aggregation
TelemetryAggregatorwithper_model_stats(),per_engine_stats(),top_models(),summary(),export_records(), andclear()methods- Time-range filtering via
since/untilparameters ModelStatsandEngineStatsdataclassesAggregatedStatssummary dataclass
- CLI enhancements
--routerflag onjarvis askfor explicit policy selectionjarvis telemetry stats-- display aggregated telemetry statisticsjarvis telemetry export --format json|csv-- export telemetry recordsjarvis telemetry clear --yes-- delete all telemetry records
v0.4.0¶
Phase 3 -- Agents, Tools, and API Server
Added¶
- Agent system
BaseAgentABC withrun()method returningAgentResultAgentContextdataclass with conversation, tools, and memory resultsAgentResultdataclass with content, tool results, turns, and metadataAgentRegistryfor pluggable agent implementationsSimpleAgent-- single-turn query-to-response, no tool callingOrchestratorAgent-- multi-turn tool-calling loop withToolExecutor, configurablemax_turnsCustomAgent-- template for user-defined agent behavior
- Tool system
BaseToolABC withspecproperty andexecute()methodToolSpecdataclass describing tool interface and characteristicsToolExecutordispatch engine with JSON argument parsing, latency tracking, and event bus integration (TOOL_CALL_START/TOOL_CALL_END)ToolRegistryfor tool discoveryto_openai_function()method for OpenAI function calling format- Built-in tools:
CalculatorTool-- safe math evaluation via AST parsingThinkTool-- reasoning scratchpad for chain-of-thoughtRetrievalTool-- memory search integrationLLMTool-- sub-model calls within agent loopsFileReadTool-- safe file reading with path validation
- OpenAI-compatible API server (
jarvis serve)- FastAPI + Uvicorn with optional
[server]extra POST /v1/chat/completions-- non-streaming and SSE streamingGET /v1/models-- list available modelsGET /health-- health check endpoint- Pydantic request/response models matching OpenAI API format
- FastAPI + Uvicorn with optional
v0.3.0¶
Phase 2 -- Memory System
Added¶
- Memory backends
MemoryBackendABC withstore(),retrieve(),delete(),clear()RetrievalResultdataclass with content, score, source, and metadataMemoryRegistryfor backend discoverySQLiteMemory-- zero-dependency default using SQLite FTS5 with BM25 ranking and FTS5 query escapingFAISSMemory-- vector search using FAISS with sentence-transformers embeddings (optional[memory-faiss]extra)ColBERTMemory-- ColBERTv2 neural retrieval backend (optional[memory-colbert]extra)BM25Memory-- BM25 ranking backend using rank-bm25 (optional[memory-bm25]extra)HybridMemory-- Reciprocal Rank Fusion combining multiple backends
- Document processing
ChunkConfigdataclass for chunk size and overlap settingschunk_text()for splitting documents into overlapping chunksingest_path()for recursively indexing files and directoriesread_document()with support for plain text, Markdown, and PDF (optional[memory-pdf]extra)
- Context injection
ContextConfigwith top-k, minimum score, and max context token settingsinject_context()for prepending memory results as system messages with source attribution--no-contextflag onjarvis askto disable injection
- CLI commands
jarvis memory index <path>-- index documents into memoryjarvis memory search <query>-- search memory for relevant chunksjarvis memory stats-- show backend statistics
- Event bus integration --
MEMORY_STOREandMEMORY_RETRIEVEevents
v0.2.0¶
Phase 1 -- Intelligence and Inference
Added¶
- Intelligence primitive
ModelSpecdataclass with parameter count, context length, quantization, VRAM requirements, and supported enginesModelRegistryfor model metadata storageBUILTIN_MODELScatalog with pre-defined model specificationsregister_builtin_models()andmerge_discovered_models()helpersHeuristicRouterwith rule-based model selectionbuild_routing_context()for query analysis (code detection, math detection, length classification)
- Inference engines
InferenceEngineABC withgenerate(),stream(),list_models(), andhealth()methodsEngineRegistryfor engine discoveryOllamaEngine-- Ollama backend via native HTTP API with tool call extractionVllmEngine-- vLLM backend via OpenAI-compatible APILlamaCppEngine-- llama.cpp server backendEngineConnectionErrorfor unreachable enginesmessages_to_dicts()for Message-to-OpenAI-format conversion
- Engine discovery
discover_engines()-- probe all registered engines for healthdiscover_models()-- aggregate model lists across enginesget_engine()-- get configured default with automatic fallback
- Hardware detection
- NVIDIA GPU detection via
nvidia-smi - AMD GPU detection via
rocm-smi - Apple Silicon detection via
system_profiler - CPU brand detection via
/proc/cpuinfoandsysctl recommend_engine()mapping hardware to best engine
- NVIDIA GPU detection via
- Telemetry
TelemetryRecorddataclass with timing, tokens, energy, and costTelemetryStorewith SQLite persistence and EventBus subscriptioninstrumented_generate()wrapper for automatic telemetry recording
- CLI
jarvis ask <query>-- query via discovered enginejarvis ask --agent simple <query>-- route through SimpleAgentjarvis model list-- list models from running enginesjarvis model info <model>-- show model details
v0.1.0¶
Phase 0 -- Project Scaffolding
Added¶
- Project structure --
hatchlingbuild backend,uvpackage manager,pyproject.tomlwith extras for optional backends - Registry system --
RegistryBase[T]generic base class with class-specific entry isolation,register()decorator,get(),create(),items(),keys(),contains(),clear()methods - Typed registries --
ModelRegistry,EngineRegistry,MemoryRegistry,AgentRegistry,ToolRegistry,RouterPolicyRegistry,BenchmarkRegistry - Core types --
Roleenum,Message,Conversation(with sliding window),ModelSpec,Quantizationenum,ToolCall,ToolResult,TelemetryRecord,StepTypeenum,TraceStep,Trace - Configuration --
JarvisConfigdataclass hierarchy, TOML loader with overlay semantics, hardware auto-detection,generate_default_toml()forjarvis init - Event bus -- Synchronous pub/sub
EventBuswithEventTypeenum for inter-primitive communication - CLI skeleton -- Click-based
jarviscommand group with--version,--help, andinitsubcommand