judge
judge
¶
TraceJudge -- LLM-as-judge scoring for agent traces.
Classes¶
TraceJudge
¶
TraceJudge(backend: InferenceBackend, model: str)
LLM-as-judge for scoring traces when no ground truth exists.
Given a :class:Trace, the judge constructs a prompt showing the
query, agent steps, and final result, then asks an LLM to rate the
quality on a 0-1 scale.
Source code in src/openjarvis/learning/optimize/feedback/judge.py
Functions¶
score_trace
¶
score_trace(trace: Trace) -> Tuple[float, str]
Score a single trace.
Returns:
(score, feedback) where score is in [0, 1] and
feedback is the judge's textual reasoning.