Index
feedback
¶
Feedback subsystem: LLM-as-judge scoring and signal aggregation.
Classes¶
FeedbackCollector
¶
Collects feedback signals: explicit user scores + LLM judge evaluations.
Signals are stored in-memory as dicts with at least trace_id,
score, source, and timestamp keys.
Source code in src/openjarvis/learning/optimize/feedback/collector.py
Functions¶
record_explicit
¶
Record an explicit numeric score (0-1) for a trace.
Source code in src/openjarvis/learning/optimize/feedback/collector.py
record_thumbs
¶
Record a thumbs-up / thumbs-down signal (converted to 1.0/0.0).
Source code in src/openjarvis/learning/optimize/feedback/collector.py
evaluate_traces
¶
evaluate_traces(traces: List[Trace], judge: TraceJudge) -> List[Dict[str, Any]]
Score traces via the LLM judge and record the results.
Returns the list of newly created feedback records.
Source code in src/openjarvis/learning/optimize/feedback/collector.py
get_records
¶
Return stored records, optionally filtered by trace_id.
Source code in src/openjarvis/learning/optimize/feedback/collector.py
stats
¶
Return aggregate statistics over all recorded feedback.
Returns a dict with count, mean_score, and a simple
distribution bucket (low / medium / high).
Source code in src/openjarvis/learning/optimize/feedback/collector.py
TraceJudge
¶
TraceJudge(backend: InferenceBackend, model: str)
LLM-as-judge for scoring traces when no ground truth exists.
Given a :class:Trace, the judge constructs a prompt showing the
query, agent steps, and final result, then asks an LLM to rate the
quality on a 0-1 scale.
Source code in src/openjarvis/learning/optimize/feedback/judge.py
Functions¶
score_trace
¶
score_trace(trace: Trace) -> Tuple[float, str]
Score a single trace.
Returns:
(score, feedback) where score is in [0, 1] and
feedback is the judge's textual reasoning.