toolcall15

toolcall15 ¶

ToolCall-15 dataset provider — lightweight tool calling benchmark.

Provides 15 scenarios across 5 categories (3 per category) that test whether a model can call the right tool with the right arguments.

Reference: https://github.com/stevibe/ToolCall-15

Classes¶

ToolCall15Dataset ¶

ToolCall15Dataset()

Bases: DatasetProvider

ToolCall-15 tool calling benchmark.

Provides 15 scenarios across 5 categories that test whether a model can call the right tool with the right arguments. All tool outputs are pre-defined (mocked) per the benchmark specification.

Source code in src/openjarvis/evals/datasets/toolcall15.py

def __init__(self) -> None:
    self._records: List[EvalRecord] = []