Intelligence Primitive¶
The Intelligence primitive represents the model — its identity, weights, quantization format, fallback chain, and the catalog of well-known models with detailed metadata. It no longer contains routing logic; query analysis and model selection have moved to the Learning primitive.
Purpose¶
The Intelligence primitive answers a single question: what is the model? It maintains a catalog of known models with metadata (parameter count, context length, VRAM requirements, supported engines) and provides helpers for registering built-in models and merging models discovered from running engines at runtime.
The primitive provides three key capabilities:
- Model catalog -- a registry of well-known models with metadata (parameter count, context length, VRAM requirements, supported engines)
- Auto-discovery -- merging models discovered from running engines into the catalog
- Model configuration --
IntelligenceConfigcaptures the local model's identity, weight paths, quantization, and preferred engine
Routing has moved
Query analysis (build_routing_context) and model selection (HeuristicRouter, RouterPolicy ABC) now live in the Learning primitive. Backward-compatible re-exports remain in intelligence/_stubs.py and intelligence/router.py so existing code continues to work.
ModelSpec¶
Every model in the system is described by a ModelSpec dataclass, defined in core/types.py:
@dataclass(slots=True)
class ModelSpec:
model_id: str # Unique identifier (e.g., "qwen3:8b")
name: str # Human-readable name
parameter_count_b: float # Total parameters in billions
context_length: int # Maximum context window (tokens)
active_parameter_count_b: Optional[float] # MoE active params (None for dense)
quantization: Quantization # Quantization format (none, fp8, int4, etc.)
min_vram_gb: float # Minimum VRAM required
supported_engines: Sequence[str] # Which engines can run this model
provider: str # Model provider (e.g., "alibaba", "meta")
requires_api_key: bool # Whether cloud API key is needed
metadata: Dict[str, Any] # Additional metadata (pricing, architecture)
Models are registered in the ModelRegistry:
from openjarvis.core.registry import ModelRegistry
# Register a model
ModelRegistry.register_value("qwen3:8b", ModelSpec(
model_id="qwen3:8b",
name="Qwen3 8B",
parameter_count_b=8.2,
context_length=32768,
supported_engines=("vllm", "ollama", "llamacpp", "sglang"),
provider="alibaba",
))
Model Catalog¶
The built-in model catalog is defined in intelligence/model_catalog.py as the BUILTIN_MODELS list. It includes models across three categories:
Local Models -- Dense¶
| Model ID | Name | Parameters | Context | Supported Engines |
|---|---|---|---|---|
qwen3:8b |
Qwen3 8B | 8.2B | 32K | vLLM, Ollama, llama.cpp, SGLang |
qwen3:32b |
Qwen3 32B | 32B | 32K | Ollama, vLLM |
llama3.3:70b |
Llama 3.3 70B | 70B | 128K | Ollama, vLLM |
llama3.2:3b |
Llama 3.2 3B | 3B | 128K | Ollama, vLLM, llama.cpp |
deepseek-coder-v2:16b |
DeepSeek Coder V2 16B | 16B | 128K | Ollama, vLLM |
mistral:7b |
Mistral 7B | 7B | 32K | Ollama, vLLM, llama.cpp |
Local Models -- Mixture of Experts (MoE)¶
| Model ID | Name | Total / Active Params | Context | Min VRAM |
|---|---|---|---|---|
gpt-oss:120b |
GPT-OSS 120B | 117B / 5.1B | 128K | 12 GB |
glm-4.7-flash |
GLM 4.7 Flash | 30B / 3B | 128K | 8 GB |
trinity-mini |
Trinity Mini | 26B / 3B | 128K | 8 GB |
Cloud Models¶
| Model ID | Provider | Context | Pricing (input/output per 1M tokens) |
|---|---|---|---|
gpt-4o |
OpenAI | 128K | $2.50 / $10.00 |
gpt-4o-mini |
OpenAI | 128K | $0.15 / $0.60 |
gpt-5-mini |
OpenAI | 400K | $0.25 / $2.00 |
claude-sonnet-4-20250514 |
Anthropic | 200K | $3.00 / $15.00 |
claude-opus-4-20250514 |
Anthropic | 200K | $15.00 / $75.00 |
claude-opus-4-6 |
Anthropic | 200K | $5.00 / $25.00 |
gemini-2.5-pro |
1M | $1.25 / $10.00 | |
gemini-2.5-flash |
1M | $0.30 / $2.50 |
Registering Built-in Models¶
The register_builtin_models() function populates the ModelRegistry with all built-in models. It skips models that are already registered, making it safe to call multiple times:
from openjarvis.intelligence import register_builtin_models
register_builtin_models()
# All BUILTIN_MODELS are now in ModelRegistry
Auto-Discovery: Merging Runtime Models¶
When engines are discovered at runtime, they report models that may not be in the built-in catalog. The merge_discovered_models() function creates minimal ModelSpec entries for these:
from openjarvis.intelligence import merge_discovered_models
# Models reported by Ollama that aren't in the catalog
merge_discovered_models("ollama", ["phi3:3.8b", "codellama:7b"])
For each model ID not already in the registry, a ModelSpec is created with the model ID as both the model_id and name, with zero-value defaults for unknown fields. This ensures the routing system can still select from all available models, even ones it has no metadata for.
IntelligenceConfig¶
The IntelligenceConfig dataclass (in core/config.py) captures the full identity of the model the system is configured to use, as well as the default sampling parameters for generation:
@dataclass(slots=True)
class IntelligenceConfig:
"""The model — identity, paths, quantization, fallback chain, and generation defaults."""
default_model: str = "" # Primary model key (e.g., "qwen3:8b")
fallback_model: str = "" # Fallback when default is unavailable
model_path: str = "" # Local weights (HF repo, GGUF file, etc.)
checkpoint_path: str = "" # Checkpoint/adapter path (e.g., LoRA)
quantization: str = "none" # none, fp8, int8, int4, gguf_q4, gguf_q8
preferred_engine: str = "" # Override engine for this model (e.g., "vllm")
provider: str = "" # local, openai, anthropic, google
# Generation defaults (overridable per-call)
temperature: float = 0.7
max_tokens: int = 1024
top_p: float = 0.9
top_k: int = 40
repetition_penalty: float = 1.0
stop_sequences: str = "" # Comma-separated stop strings
Model Identity Fields¶
| Field | Type | Default | Description |
|---|---|---|---|
default_model |
str |
"" |
Primary model registry key. Resolved at startup; overrides any engine default. |
fallback_model |
str |
"" |
Used when the default model is not available on any running engine. |
model_path |
str |
"" |
Path or HuggingFace repo ID for local weights (e.g., "./models/qwen3-8b.gguf" or "Qwen/Qwen3-8B"). |
checkpoint_path |
str |
"" |
Path to a fine-tuned checkpoint or LoRA adapter directory. |
quantization |
str |
"none" |
Quantization format. Accepted values: none, fp8, int8, int4, gguf_q4, gguf_q8. |
preferred_engine |
str |
"" |
When set, SystemBuilder, sdk.py, and cli/ask.py use this engine key instead of config.engine.default. |
provider |
str |
"" |
Model provider hint: local, openai, anthropic, google. Used by the Cloud engine backend to route API calls. |
Generation Default Fields¶
These fields set the default sampling parameters for every inference call. Individual calls can override them by passing keyword arguments to engine.generate().
| Field | Type | Default | Description |
|---|---|---|---|
temperature |
float |
0.7 |
Sampling temperature. Lower values produce more deterministic output; higher values increase diversity. |
max_tokens |
int |
1024 |
Maximum number of tokens to generate per call. |
top_p |
float |
0.9 |
Nucleus sampling probability mass. At each step, only tokens comprising the top-p probability mass are considered. |
top_k |
int |
40 |
Top-k sampling: only consider the top-k most likely tokens at each step. |
repetition_penalty |
float |
1.0 |
Penalize repeated token sequences. Values greater than 1.0 reduce repetition. |
stop_sequences |
str |
"" |
Comma-separated stop strings. Generation halts when any stop string appears in the output. |
Moved from Agent
Generation parameters (temperature, max_tokens) previously lived under [agent] in the config file. They now live under [intelligence]. Old configs with these fields under [agent] are automatically migrated at load time. See the configuration migration guide for details.
TOML Configuration¶
[intelligence]
default_model = "qwen3:8b"
fallback_model = "llama3.2:3b"
temperature = 0.7
max_tokens = 1024
# top_p = 0.9
# top_k = 40
# repetition_penalty = 1.0
# stop_sequences = ""
# Local weight overrides (optional)
# model_path = "./models/qwen3-8b-instruct.gguf"
# checkpoint_path = "./checkpoints/my-lora"
# quantization = "gguf_q4"
# Engine selection for this model (takes priority over [engine].default)
# preferred_engine = "vllm"
# Provider for cloud models
# provider = "openai"
Engine Selection Priority¶
When resolving which engine to use, SystemBuilder, sdk.py, and cli/ask.py check config.intelligence.preferred_engine before config.engine.default:
1. Explicit --engine CLI flag or engine_key= SDK parameter
2. config.intelligence.preferred_engine ← new field
3. config.engine.default
4. First healthy engine discovered at runtime
This lets you pin a specific model to a specific engine without changing the global engine default. For example, a GGUF quantized model can be pinned to llamacpp while the global default remains ollama:
[engine]
default = "ollama"
[intelligence]
default_model = "llama3.2:3b"
model_path = "./models/llama-3.2-3b.Q4_K_M.gguf"
quantization = "gguf_q4"
preferred_engine = "llamacpp"
Public API¶
intelligence/__init__.py exports exactly three names:
from openjarvis.intelligence import (
BUILTIN_MODELS, # List[ModelSpec] — the full built-in catalog
merge_discovered_models, # (engine_key, model_ids) -> None
register_builtin_models, # () -> None
)
Backward-Compatibility Shims¶
The following names are still importable from openjarvis.intelligence via shim modules, but their canonical locations have moved:
| Name | Old location | Canonical location |
|---|---|---|
RouterPolicy |
intelligence/_stubs.py |
learning/_stubs.py |
QueryAnalyzer |
intelligence/_stubs.py |
learning/_stubs.py |
HeuristicRouter |
intelligence/router.py |
learning/router.py |
build_routing_context |
intelligence/router.py |
learning/router.py |
DefaultQueryAnalyzer |
intelligence/router.py |
learning/router.py |
New code should import from the canonical learning.* locations. The shims in intelligence/_stubs.py and intelligence/router.py are retained for backward compatibility only.
Integration with Learning¶
The Learning primitive consumes the model catalog to make routing decisions. The HeuristicRouter and TraceDrivenPolicy both read ModelRegistry to compare model sizes when selecting between candidates. See the Learning & Traces documentation for full details on routing policies, the RouterPolicy ABC, and the trace-driven feedback loop.