Skip to content

terminalbench

terminalbench

TerminalBench dataset (terminal-bench/terminal-bench).

Agentic benchmark for terminal / command-line tasks.

Classes

TerminalBenchDataset

TerminalBenchDataset()

Bases: DatasetProvider

TerminalBench agentic terminal benchmark (HuggingFace variant).

Source code in src/openjarvis/evals/datasets/terminalbench.py
def __init__(self) -> None:
    self._records: List[EvalRecord] = []