taubench
taubench
¶
TauBench scorer — wraps tau2-bench's evaluation results.
Since TauBench runs its own simulation loop (agent + user simulator + tools + evaluation), the scorer simply reads the reward that was computed during task execution and stored in record.metadata.
Classes¶
TauBenchScorer
¶
Bases: Scorer
TauBench scorer — reads pre-computed rewards from tau2-bench.
The actual evaluation (DB state checks, action matching, communication checks, NL assertions) is done by tau2-bench's evaluator during simulation. This scorer extracts the result.