terminalbench_native_structural
terminalbench_native_structural
¶
TerminalBench Native scorer — test-result-based evaluation.
Reads is_resolved and test_results from the record's metadata
(populated by the native terminal-bench harness) and returns a
deterministic pass/fail without any LLM judging.
Classes¶
TerminalBenchNativeScorer
¶
Bases: Scorer
Test-result-based scorer for TerminalBench Native tasks.
The native terminal-bench package produces is_resolved and
test_results fields after executing a task. This scorer reads
those fields from record.metadata and translates them into the
standard (is_correct, meta) tuple.