TerminalBench Native dataset — loads from the terminal-bench pip package (v2 API).
Agentic benchmark using the native terminal-bench SDK for task loading
and test-based evaluation.
Classes
TerminalBenchNativeDataset
TerminalBenchNativeDataset(name: str = 'terminal-bench-core', version: str = '0.1.1', path: Optional[str] = None, task_ids: Optional[List[str]] = None, n_tasks: Optional[int] = None)
Bases: DatasetProvider
TerminalBench using the native terminal-bench pip package (v2 API).
Source code in src/openjarvis/evals/datasets/terminalbench_native.py
| def __init__(
self,
name: str = "terminal-bench-core",
version: str = "0.1.1",
path: Optional[str] = None,
task_ids: Optional[List[str]] = None,
n_tasks: Optional[int] = None,
) -> None:
self._name = name
self._version = version
self._path = Path(path) if path else None
self._task_ids = task_ids
self._n_tasks = n_tasks
self._records: List[EvalRecord] = []
|
Functions
create_task_env
Return a TerminalBenchTaskEnv for the given record.
Source code in src/openjarvis/evals/datasets/terminalbench_native.py
| def create_task_env(self, record):
"""Return a TerminalBenchTaskEnv for the given record."""
try:
from openjarvis.evals.execution.terminalbench_env import (
TerminalBenchTaskEnv,
)
return TerminalBenchTaskEnv(record.metadata)
except ImportError:
return None
|
verify_requirements
Check that terminal-bench and docker are available.
Source code in src/openjarvis/evals/datasets/terminalbench_native.py
| def verify_requirements(self):
"""Check that terminal-bench and docker are available."""
issues = []
if not _HAS_TERMINALBENCH:
issues.append("terminal-bench package not installed")
import shutil
if not shutil.which("docker"):
issues.append("docker not found in PATH")
return issues
|