hle_judge
hle_judge
¶
HLE scorer -- LLM-as-judge for Humanity's Last Exam.
Uses the same exact-match-then-LLM-fallback pattern as the reasoning judge but with an HLE-specific grading template. Adapted from IPW's evaluation handlers.
Classes¶
HLEScorer
¶
HLEScorer(judge_backend: InferenceBackend, judge_model: str)
Bases: LLMJudgeScorer
LLM-as-judge evaluation for HLE benchmark.
Fast path: normalized exact match (no API call). Slow path: LLM judge for semantic equivalence.