Index
optimize
¶
Optimization framework for OpenJarvis configuration tuning.
Classes¶
LLMOptimizer
¶
LLMOptimizer(search_space: SearchSpace, optimizer_model: str = 'claude-sonnet-4-6', optimizer_backend: Optional[InferenceBackend] = None)
Uses a cloud LLM to propose optimal OpenJarvis configs.
Inspired by DSPy's GEPA: uses textual feedback from execution traces rather than just scalar rewards.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
Functions¶
propose_initial
¶
propose_initial() -> TrialConfig
Propose a reasonable starting config from the search space.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
propose_next
¶
propose_next(history: List[TrialResult], traces: Optional[List[Trace]] = None, frontier_ids: Optional[set] = None) -> TrialConfig
Ask the LLM to propose the next config to evaluate.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
analyze_trial
¶
analyze_trial(trial: TrialConfig, summary: RunSummary, traces: Optional[List[Trace]] = None, sample_scores: Optional[List[SampleScore]] = None, per_benchmark: Optional[List[BenchmarkScore]] = None) -> TrialFeedback
Ask the LLM to analyze a completed trial. Returns structured feedback.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
propose_targeted
¶
propose_targeted(history: List[TrialResult], base_config: TrialConfig, target_primitive: str, frontier_ids: Optional[set] = None) -> TrialConfig
Propose a config that only changes one primitive.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
propose_merge
¶
propose_merge(candidates: List[TrialResult], history: List[TrialResult], frontier_ids: Optional[set] = None) -> TrialConfig
Combine best aspects of frontier members into one config.
Source code in src/openjarvis/learning/optimize/llm_optimizer.py
OptimizationEngine
¶
OptimizationEngine(search_space: SearchSpace, llm_optimizer: LLMOptimizer, trial_runner: TrialRunner, store: Optional[OptimizationStore] = None, max_trials: int = 20, early_stop_patience: int = 5)
Orchestrates the optimize loop: propose -> evaluate -> analyze -> repeat.
Source code in src/openjarvis/learning/optimize/optimizer.py
Functions¶
run
¶
run(progress_callback: Optional[Callable[[int, int], None]] = None) -> OptimizationRun
Execute the full optimization loop.
- Generate a run_id via uuid.
llm_optimizer.propose_initial()-> first config.- Loop up to
max_trials: a.trial_runner.run_trial(config)-> TrialResult b.llm_optimizer.analyze_trial(config, summary, traces)c. Update TrialResult with analysis text d. Append to history e. If store,store.save_trial(result)f. Update best_trial if accuracy improved g. Check early stopping (no improvement for patience trials) h. If not stopped,llm_optimizer.propose_next(history) - Set run status to
"completed". - If store,
store.save_run(optimization_run). - Return the :class:
OptimizationRun.
Args:
progress_callback: Optional (trial_num, max_trials) -> None
called after each trial completes.
Source code in src/openjarvis/learning/optimize/optimizer.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 | |
export_best_recipe
¶
export_best_recipe(run: OptimizationRun, path: Path) -> Path
Export the best trial's config as a TOML recipe file.
Args:
run: A completed :class:OptimizationRun.
path: Destination path for the TOML file.
Returns: The path written to.
Raises: ValueError: If there is no best trial in the run.
Source code in src/openjarvis/learning/optimize/optimizer.py
OptimizationStore
¶
SQLite-backed storage for optimization runs and trials.
Source code in src/openjarvis/learning/optimize/store.py
Functions¶
save_run
¶
save_run(run: OptimizationRun) -> None
Persist an optimization run (insert or update).
Source code in src/openjarvis/learning/optimize/store.py
get_run
¶
get_run(run_id: str) -> Optional[OptimizationRun]
Retrieve an optimization run by id, or None.
Source code in src/openjarvis/learning/optimize/store.py
list_runs
¶
Return summary dicts of recent optimization runs.
Source code in src/openjarvis/learning/optimize/store.py
save_trial
¶
save_trial(run_id: str, trial: TrialResult) -> None
Persist a single trial result.
Source code in src/openjarvis/learning/optimize/store.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
get_trials
¶
get_trials(run_id: str) -> List[TrialResult]
Retrieve all trial results for a given run.
Source code in src/openjarvis/learning/optimize/store.py
BenchmarkSpec
dataclass
¶
Specification for one benchmark in a multi-benchmark optimization.
MultiBenchTrialRunner
¶
MultiBenchTrialRunner(benchmark_specs: List[BenchmarkSpec], judge_model: str = 'gpt-5-mini-2025-08-07', output_dir: str = 'results/optimize/')
Evaluates a proposed config across multiple benchmarks.
Delegates to :class:TrialRunner per benchmark, then aggregates
results into a single composite :class:TrialResult with weighted
metrics and per-benchmark breakdowns.
Source code in src/openjarvis/learning/optimize/trial_runner.py
Functions¶
run_trial
¶
run_trial(trial: TrialConfig) -> TrialResult
Run trial against all benchmarks and return a composite result.
Source code in src/openjarvis/learning/optimize/trial_runner.py
TrialRunner
¶
TrialRunner(benchmark: str, max_samples: int = 50, judge_model: str = 'gpt-5-mini-2025-08-07', output_dir: str = 'results/optimize/')
Evaluates a proposed config against a benchmark.
Bridges the optimization types (:class:TrialConfig) to the eval
framework (:class:EvalRunner) so the optimizer can score candidate
configurations end-to-end.
Source code in src/openjarvis/learning/optimize/trial_runner.py
Functions¶
run_trial
¶
run_trial(trial: TrialConfig) -> TrialResult
Run trial against the configured benchmark and return a result.
Steps:
1. Convert trial to a :class:Recipe and extract params.
2. Build a :class:RunConfig from recipe + benchmark settings.
3. Lazily import eval-framework registries to resolve the
benchmark -> dataset + scorer, and build the backend.
4. Execute via EvalRunner.run() -> :class:RunSummary.
5. Map the summary into a :class:TrialResult.
Source code in src/openjarvis/learning/optimize/trial_runner.py
BenchmarkScore
dataclass
¶
BenchmarkScore(benchmark: str, accuracy: float = 0.0, mean_latency_seconds: float = 0.0, total_cost_usd: float = 0.0, total_energy_joules: float = 0.0, total_tokens: int = 0, samples_evaluated: int = 0, errors: int = 0, weight: float = 1.0, summary: Optional[Any] = None, sample_scores: List['SampleScore'] = list())
Per-benchmark metrics from a multi-benchmark evaluation trial.
ObjectiveSpec
dataclass
¶
A single optimization objective.
OptimizationRun
dataclass
¶
OptimizationRun(run_id: str, search_space: SearchSpace, trials: List[TrialResult] = list(), best_trial: Optional[TrialResult] = None, best_recipe_path: Optional[str] = None, status: str = 'running', optimizer_model: str = '', benchmark: str = '', benchmarks: List[str] = list(), pareto_frontier: List[TrialResult] = list(), objectives: List[ObjectiveSpec] = (lambda: list(DEFAULT_OBJECTIVES))())
Complete optimization session.
SampleScore
dataclass
¶
SampleScore(record_id: str, is_correct: Optional[bool] = None, score: Optional[float] = None, latency_seconds: float = 0.0, prompt_tokens: int = 0, completion_tokens: int = 0, cost_usd: float = 0.0, error: Optional[str] = None, ttft: float = 0.0, energy_joules: float = 0.0, power_watts: float = 0.0, gpu_utilization_pct: float = 0.0, throughput_tok_per_sec: float = 0.0, mfu_pct: float = 0.0, mbu_pct: float = 0.0, ipw: float = 0.0, ipj: float = 0.0, energy_per_output_token_joules: float = 0.0, throughput_per_watt: float = 0.0, mean_itl_ms: float = 0.0)
Per-sample metrics from an evaluation trial.
SearchDimension
dataclass
¶
SearchDimension(name: str, dim_type: str, values: List[Any] = list(), low: Optional[float] = None, high: Optional[float] = None, description: str = '', primitive: str = '')
One tunable dimension in the config space.
SearchSpace
dataclass
¶
SearchSpace(dimensions: List[SearchDimension] = list(), fixed: Dict[str, Any] = dict(), constraints: List[str] = list())
The full space of configs the optimizer can propose.
Functions¶
to_prompt_description
¶
Render search space as structured text for the LLM optimizer.
Source code in src/openjarvis/learning/optimize/types.py
TrialConfig
dataclass
¶
A single candidate configuration proposed by the optimizer.
Functions¶
to_recipe
¶
to_recipe() -> Recipe
Map params back to Recipe fields.
Source code in src/openjarvis/learning/optimize/types.py
TrialFeedback
dataclass
¶
TrialFeedback(summary_text: str = '', failure_patterns: List[str] = list(), primitive_ratings: Dict[str, str] = dict(), suggested_changes: List[str] = list(), target_primitive: str = '')
Structured feedback from trial analysis.
TrialResult
dataclass
¶
TrialResult(trial_id: str, config: TrialConfig, accuracy: float = 0.0, mean_latency_seconds: float = 0.0, total_cost_usd: float = 0.0, total_energy_joules: float = 0.0, total_tokens: int = 0, samples_evaluated: int = 0, analysis: str = '', failure_modes: List[str] = list(), per_sample_feedback: List[Dict[str, Any]] = list(), summary: Optional[RunSummary] = None, sample_scores: List[SampleScore] = list(), structured_feedback: Optional[TrialFeedback] = None, per_benchmark: List[BenchmarkScore] = list())
Result of evaluating a trial, with both scalar and textual feedback.
Functions¶
load_benchmark_specs
¶
Extract benchmark specs from a loaded optimization config.
Supports two formats:
- Multi-benchmark: [[optimize.benchmarks]] array of tables
- Single-benchmark fallback: optimize.benchmark string
Returns a list of :class:BenchmarkSpec (from trial_runner).
Returns an empty list if no benchmarks are configured (caller
should fall back to CLI --benchmark).
Source code in src/openjarvis/learning/optimize/config.py
load_objectives
¶
load_objectives(data: Dict[str, Any]) -> List[ObjectiveSpec]
Extract objectives from a loaded optimization config.
Reads optimize.objectives (a list of tables) and returns
a list of :class:ObjectiveSpec. Falls back to
:data:DEFAULT_OBJECTIVES if the key is absent.
Source code in src/openjarvis/learning/optimize/config.py
load_optimize_config
¶
Load an optimization config TOML file.
Returns the raw dict with keys such as optimize.max_trials,
optimize.benchmark, optimize.search, optimize.fixed,
optimize.constraints, etc.
Raises: FileNotFoundError: If path does not exist.
Source code in src/openjarvis/learning/optimize/config.py
compute_pareto_frontier
¶
compute_pareto_frontier(trials: List[TrialResult], objectives: List[ObjectiveSpec]) -> List[TrialResult]
Compute the Pareto frontier: trials not dominated by any other.
A trial A dominates trial B if A is >= B on all objectives and > B on at least one (direction-aware: maximize flips the comparison).
Source code in src/openjarvis/learning/optimize/optimizer.py
build_search_space
¶
build_search_space(config: Dict[str, Any]) -> SearchSpace
Build a SearchSpace from a TOML-style config dict.
Expected format::
{
"optimize": {
"search": [
{
"name": "agent.type",
"type": "categorical",
"values": ["orchestrator", "native_react"],
"description": "Agent architecture",
},
{
"name": "intelligence.temperature",
"type": "continuous",
"low": 0.0,
"high": 1.0,
"description": "Generation temperature",
},
],
"fixed": {"engine": "ollama", "model": "qwen3:8b"},
"constraints": {
"rules": ["SimpleAgent should only have max_turns = 1"],
},
}
}