Index
learning
¶
Learning primitive — router policies, reward functions, and trace-driven learning.
Classes¶
RewardFunction
¶
RouterPolicy
¶
RoutingContext
dataclass
¶
RoutingContext(query: str = '', query_length: int = 0, has_code: bool = False, has_math: bool = False, language: str = 'en', urgency: float = 0.5, metadata: Dict[str, Any] = dict())
Context describing a query for model routing decisions.
AgentConfigEvolver
¶
AgentConfigEvolver(trace_store: TraceStore, *, config_dir: Union[str, Path], min_quality: float = 0.5)
Analyze traces to evolve agent TOML configs with versioning.
| PARAMETER | DESCRIPTION |
|---|---|
trace_store
|
A :class:
TYPE:
|
config_dir
|
Directory where agent TOML configs are written.
TYPE:
|
min_quality
|
Minimum average feedback score for a recommendation to be emitted.
TYPE:
|
Source code in src/openjarvis/learning/agent_evolver.py
Functions¶
analyze
¶
Analyze traces, return recommendations per query class.
Returns a list of dicts, each containing:
- query_class: the classified query category
- recommended_tools: list of tool names sorted by frequency
- recommended_agent: the best-performing agent for this class
- recommended_max_turns: suggested max_turns value
- sample_count: number of traces analyzed for this class
Source code in src/openjarvis/learning/agent_evolver.py
write_config
¶
write_config(agent_name: str, *, tools: List[str], max_turns: int = 10, temperature: float = 0.3, system_prompt: str = '') -> Path
Write agent TOML config, archiving previous version first.
Returns the :class:Path to the written config file.
Source code in src/openjarvis/learning/agent_evolver.py
list_versions
¶
List all versions (including current) for agent_name.
Returns a list of dicts with version, path, and modified.
Versions are numbered starting from 1 (oldest archived) through to
the current (highest version number).
Source code in src/openjarvis/learning/agent_evolver.py
rollback
¶
Rollback to a specific version.
Raises :class:ValueError if the requested version does not exist.
Source code in src/openjarvis/learning/agent_evolver.py
HeuristicRewardFunction
¶
HeuristicRewardFunction(*, weight_latency: float = 0.4, weight_cost: float = 0.3, weight_efficiency: float = 0.3, max_latency: float = 30.0, max_cost: float = 0.01)
Bases: RewardFunction
Computes a scalar reward based on latency, cost, and token efficiency.
Each component is normalised to [0, 1] and combined via a weighted sum.
Source code in src/openjarvis/learning/heuristic_reward.py
LearningOrchestrator
¶
LearningOrchestrator(*, trace_store: Any, config_dir: Union[str, Path], eval_fn: Optional[Callable[[], float]] = None, min_improvement: float = 0.02, min_sft_pairs: int = 10, min_quality: float = 0.7, lora_config: Optional[Any] = None, model_name: Optional[str] = None)
Orchestrate a single trace->learn->eval cycle.
| PARAMETER | DESCRIPTION |
|---|---|
trace_store
|
Object with
TYPE:
|
config_dir
|
Directory where agent TOML configs are written / evolved.
TYPE:
|
eval_fn
|
Optional callable returning a float score (higher = better). Called before and after learning to gate acceptance.
TYPE:
|
min_improvement
|
Minimum improvement in eval score required to accept the update.
TYPE:
|
min_sft_pairs
|
Minimum number of SFT pairs required to trigger LoRA training.
TYPE:
|
min_quality
|
Minimum feedback quality threshold for :class:
TYPE:
|
lora_config
|
Optional :class:
TYPE:
|
model_name
|
Model name for LoRA training (passed to :class:
TYPE:
|
Source code in src/openjarvis/learning/learning_orchestrator.py
Functions¶
run
¶
Execute one learning cycle.
| PARAMETER | DESCRIPTION |
|---|---|
agent_id
|
When provided, only traces from this agent are considered.
TYPE:
|
Returns
|
|
Steps
- Mine traces: extract sft_pairs, routing_pairs, agent_pairs
- If no data: return skipped
- Run baseline eval (if eval_fn provided)
- Update routing recommendations
- Evolve agent configs
- Run LoRA training (if lora_config provided AND enough pairs AND torch available)
- Run post-learning eval (if eval_fn provided)
- Accept/reject based on improvement threshold
Source code in src/openjarvis/learning/learning_orchestrator.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
LLMOptimizer
¶
LLMOptimizer(search_space: SearchSpace, optimizer_model: str = 'claude-sonnet-4-6', optimizer_backend: Optional[InferenceBackend] = None)
Uses a cloud LLM to propose optimal OpenJarvis configs.
Inspired by DSPy's GEPA: uses textual feedback from execution traces rather than just scalar rewards.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
Functions¶
propose_initial
¶
propose_initial() -> TrialConfig
Propose a reasonable starting config from the search space.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
propose_next
¶
propose_next(history: List[TrialResult], traces: Optional[List[Trace]] = None, frontier_ids: Optional[set] = None) -> TrialConfig
Ask the LLM to propose the next config to evaluate.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
analyze_trial
¶
analyze_trial(trial: TrialConfig, summary: RunSummary, traces: Optional[List[Trace]] = None, sample_scores: Optional[List[SampleScore]] = None, per_benchmark: Optional[List[BenchmarkScore]] = None) -> TrialFeedback
Ask the LLM to analyze a completed trial. Returns structured feedback.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
propose_targeted
¶
propose_targeted(history: List[TrialResult], base_config: TrialConfig, target_primitive: str, frontier_ids: Optional[set] = None) -> TrialConfig
Propose a config that only changes one primitive.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
propose_merge
¶
propose_merge(candidates: List[TrialResult], history: List[TrialResult], frontier_ids: Optional[set] = None) -> TrialConfig
Combine best aspects of frontier members into one config.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
OptimizationEngine
¶
OptimizationEngine(search_space: SearchSpace, llm_optimizer: LLMOptimizer, trial_runner: TrialRunner, store: Optional[OptimizationStore] = None, max_trials: int = 20, early_stop_patience: int = 5)
Orchestrates the optimize loop: propose -> evaluate -> analyze -> repeat.
Source code in src/openjarvis/learning/optimize/optimizer.py
Functions¶
run
¶
run(progress_callback: Optional[Callable[[int, int], None]] = None) -> OptimizationRun
Execute the full optimization loop.
- Generate a run_id via uuid.
llm_optimizer.propose_initial()-> first config.- Loop up to
max_trials: a.trial_runner.run_trial(config)-> TrialResult b.llm_optimizer.analyze_trial(config, summary, traces)c. Update TrialResult with analysis text d. Append to history e. If store,store.save_trial(result)f. Update best_trial if accuracy improved g. Check early stopping (no improvement for patience trials) h. If not stopped,llm_optimizer.propose_next(history) - Set run status to
"completed". - If store,
store.save_run(optimization_run). - Return the :class:
OptimizationRun.
Args:
progress_callback: Optional (trial_num, max_trials) -> None
called after each trial completes.
Source code in src/openjarvis/learning/optimize/optimizer.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 | |
export_best_recipe
¶
export_best_recipe(run: OptimizationRun, path: Path) -> Path
Export the best trial's config as a TOML recipe file.
Args:
run: A completed :class:OptimizationRun.
path: Destination path for the TOML file.
Returns: The path written to.
Raises: ValueError: If there is no best trial in the run.
Source code in src/openjarvis/learning/optimize/optimizer.py
OptimizationStore
¶
SQLite-backed storage for optimization runs and trials.
Source code in src/openjarvis/learning/optimize/store.py
Functions¶
save_run
¶
save_run(run: OptimizationRun) -> None
Persist an optimization run (insert or update).
Source code in src/openjarvis/learning/optimize/store.py
get_run
¶
get_run(run_id: str) -> Optional[OptimizationRun]
Retrieve an optimization run by id, or None.
Source code in src/openjarvis/learning/optimize/store.py
list_runs
¶
Return summary dicts of recent optimization runs.
Source code in src/openjarvis/learning/optimize/store.py
save_trial
¶
save_trial(run_id: str, trial: TrialResult) -> None
Persist a single trial result.
Source code in src/openjarvis/learning/optimize/store.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
get_trials
¶
get_trials(run_id: str) -> List[TrialResult]
Retrieve all trial results for a given run.
Source code in src/openjarvis/learning/optimize/store.py
HeuristicRouter
¶
HeuristicRouter(available_models: List[str] | None = None, *, default_model: str = '', fallback_model: str = '')
Bases: RouterPolicy
Rule-based model router.
Rules (applied in order): 1. Code detected → prefer model with "code"/"coder" in name 2. Math detected → prefer larger model 3. Short query (<50 chars, no code/math) → prefer smaller/faster model 4. Long/complex query (>500 chars OR reasoning keywords) → prefer larger model 5. High urgency (>0.8) → override to smaller model 6. Default fallback → default_model → fallback_model → first available
Source code in src/openjarvis/learning/router.py
TrainingDataMiner
¶
Extract supervised training pairs from stored traces.
| PARAMETER | DESCRIPTION |
|---|---|
trace_store
|
Any object with a
TYPE:
|
min_quality
|
Minimum
TYPE:
|
min_samples_per_class
|
Minimum number of samples a query class must have to appear in routing/agent-config results.
TYPE:
|
Source code in src/openjarvis/learning/training/data.py
Functions¶
extract_sft_pairs
¶
Return SFT training pairs from high-quality traces.
Each entry is a dict with keys: input, output,
query_class, model, feedback.
Duplicate (input, output) pairs are collapsed; the first
occurrence is kept.
Source code in src/openjarvis/learning/training/data.py
extract_routing_pairs
¶
Return per-query-class routing recommendations.
Returns a dict mapping query class to:
best_model— model with highest average feedback for the class.avg_feedback— average feedback across all models for the class.sample_count— total number of qualifying traces in the class.all_models— dict of{model: {"avg_feedback": float, "count": int}}.
Source code in src/openjarvis/learning/training/data.py
extract_agent_config_pairs
¶
Return per-query-class agent and tool recommendations.
Returns a dict mapping query class to:
best_agent— agent with the highest average feedback.best_tools— most frequently used tools by the best agent.avg_feedback— average feedback across all agents for the class.sample_count— total number of qualifying traces in the class.
Source code in src/openjarvis/learning/training/data.py
LoRATrainer
¶
LoRATrainer(config: LoRATrainingConfig, *, model_name: str = 'Qwen/Qwen3-0.6B', device: Optional[str] = None)
Fine-tune a local causal LM with LoRA (or QLoRA) adapters.
| PARAMETER | DESCRIPTION |
|---|---|
config
|
LoRA training configuration.
TYPE:
|
model_name
|
HuggingFace model identifier or local path.
TYPE:
|
device
|
PyTorch device string.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ImportError
|
If |
Source code in src/openjarvis/learning/training/lora.py
Functions¶
prepare_dataset
¶
Convert SFT pairs to tokenized examples.
Each returned dict contains input_ids, attention_mask,
and text (the raw formatted string before tokenization).
| PARAMETER | DESCRIPTION |
|---|---|
pairs
|
List of dicts with at least
TYPE:
|
Source code in src/openjarvis/learning/training/lora.py
train
¶
Run LoRA fine-tuning on the given SFT pairs.
| PARAMETER | DESCRIPTION |
|---|---|
pairs
|
List of dicts with at least
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Training summary with keys: |
Source code in src/openjarvis/learning/training/lora.py
LoRATrainingConfig
dataclass
¶
LoRATrainingConfig(lora_rank: int = 16, lora_alpha: int = 32, lora_dropout: float = 0.05, target_modules: List[str] = (lambda: ['q_proj', 'v_proj'])(), num_epochs: int = 3, batch_size: int = 4, learning_rate: float = 2e-05, weight_decay: float = 0.01, warmup_ratio: float = 0.1, max_grad_norm: float = 1.0, max_seq_length: int = 2048, use_4bit: bool = False, output_dir: str = 'checkpoints/lora', save_every_n_epochs: int = 1, gradient_checkpointing: bool = True)
Configuration for LoRA / QLoRA fine-tuning.
Functions¶
build_routing_context
¶
build_routing_context(query: str, *, urgency: float = 0.5) -> RoutingContext
Populate a RoutingContext from a raw query string.
Source code in src/openjarvis/learning/router.py
ensure_registered
¶
Ensure all learning policies are registered in RouterPolicyRegistry.
Imported lazily to avoid circular imports with the intelligence primitive.