terminalbench_v2_1_env
terminalbench_v2_1_env
¶
TerminalBench V2.1 task environment.
Per-task Docker container + scoring lifecycle. Intended to be used as a
context manager by the eval runner so that the agent has a live
container to interact with through :mod:openjarvis.tools.docker_shell_exec.
On __enter__:
* Pulls / runs the task's docker image with sleep infinity.
* Mounts the task's tests/ directory read-only at /tests.
* Creates /logs/verifier/ for reward output.
* Binds the container name into :mod:docker_shell_exec's thread-local
state so the agent's docker_shell_exec tool targets this container.
On __exit__:
* Runs /tests/test.sh to produce /logs/verifier/reward.txt.
* Reads the reward, stashes it on record.metadata.
* Clears the docker_shell_exec thread-local.
* Tears down the container.