dataset
dataset
¶
Abstract base class for dataset providers.
Classes¶
DatasetProvider
¶
Bases: ABC
Base class for all evaluation dataset providers.
Functions¶
load
abstractmethod
¶
load(*, max_samples: Optional[int] = None, split: Optional[str] = None, seed: Optional[int] = None) -> None
Load the dataset (possibly downloading from HuggingFace).
iter_records
abstractmethod
¶
iter_records() -> Iterable[EvalRecord]
size
abstractmethod
¶
create_task_env
¶
create_task_env(record: EvalRecord) -> Optional[AbstractContextManager]
verify_requirements
¶
iter_episodes
¶
iter_episodes() -> Iterable[List[EvalRecord]]
Iterate over episodes (groups of sequential records).
Default: each record is its own single-record episode. Override for benchmarks requiring sequential processing with shared agent state within an episode.