reasoning_judge
reasoning_judge
¶
Reasoning judge scorer -- LLM-as-judge for math and reasoning tasks.
Attempts normalized exact match first, then falls back to an LLM judge for semantic comparison. Adapted from IPW's reasoning evaluation handlers.
Classes¶
ReasoningJudgeScorer
¶
ReasoningJudgeScorer(judge_backend: InferenceBackend, judge_model: str)
Bases: LLMJudgeScorer
LLM-as-judge evaluation for reasoning tasks.
Fast path: normalized exact match (no API call). Slow path: LLM judge for semantic equivalence.
Source code in src/openjarvis/evals/core/scorer.py
Functions¶
reasoning_exact_match
¶
Normalized exact match for reasoning answers.
Handles numbers, LaTeX boxed answers, and plain strings.