trial_runner
trial_runner
¶
TrialRunner -- evaluates a proposed config against a benchmark.
Classes¶
BenchmarkSpec
dataclass
¶
Specification for one benchmark in a multi-benchmark optimization.
TrialRunner
¶
TrialRunner(benchmark: str, max_samples: int = 50, judge_model: str = 'gpt-5-mini-2025-08-07', output_dir: str = 'results/optimize/')
Evaluates a proposed config against a benchmark.
Bridges the optimization types (:class:TrialConfig) to the eval
framework (:class:EvalRunner) so the optimizer can score candidate
configurations end-to-end.
Source code in src/openjarvis/learning/optimize/trial_runner.py
Functions¶
run_trial
¶
run_trial(trial: TrialConfig) -> TrialResult
Run trial against the configured benchmark and return a result.
Steps:
1. Convert trial to a :class:Recipe and extract params.
2. Build a :class:RunConfig from recipe + benchmark settings.
3. Lazily import eval-framework registries to resolve the
benchmark -> dataset + scorer, and build the backend.
4. Execute via EvalRunner.run() -> :class:RunSummary.
5. Map the summary into a :class:TrialResult.
Source code in src/openjarvis/learning/optimize/trial_runner.py
MultiBenchTrialRunner
¶
MultiBenchTrialRunner(benchmark_specs: List[BenchmarkSpec], judge_model: str = 'gpt-5-mini-2025-08-07', output_dir: str = 'results/optimize/')
Evaluates a proposed config across multiple benchmarks.
Delegates to :class:TrialRunner per benchmark, then aggregates
results into a single composite :class:TrialResult with weighted
metrics and per-benchmark breakdowns.
Source code in src/openjarvis/learning/optimize/trial_runner.py
Functions¶
run_trial
¶
run_trial(trial: TrialConfig) -> TrialResult
Run trial against all benchmarks and return a composite result.