scorer
scorer
¶
Backward-compat shim: moved to learning.optimize.
Classes¶
PersonalBenchmarkScorer
¶
PersonalBenchmarkScorer(judge_backend: InferenceBackend, judge_model: str)
Bases: LLMJudgeScorer
Judges a candidate response against the best-known response from traces.
Source code in src/openjarvis/learning/optimize/personal/scorer.py
Functions¶
score
¶
score(record: EvalRecord, model_answer: str) -> Tuple[Optional[bool], Dict[str, Any]]
Compare model_answer against record.reference using the judge LLM.
Returns (is_correct, metadata) where is_correct indicates whether
the candidate answer is at least as good as the reference.