Skip to content

toolorchestra

toolorchestra

ToolScale (nvidia) — external tool-use corpus.

NOT USED FOR EVALUATION. Surfaces tool-use trajectories from nvidia/ToolScale (the dataset underlying the ToolOrchestra paper) to the LLM-guided spec search proposer via openjarvis.learning.distillation.external_adapter so the diagnose phase can reason over a broad pool of tool-use traces.

The dataset_id is kept as "toolorchestra" (matching the published paper name) even though the HuggingFace dataset is published as nvidia/ToolScale.

HF dataset: nvidia/ToolScale — single train split. Schema: - id: record identifier (string) - description: task/policy metadata dict (or string repr) - user_scenario: dict with persona and instructions - instructions.task_instructions: the user's tool-use request (problem) - instructions.reason_for_call: optional context / motivation - initial_state: environment state at task start (may be None/empty) - evaluation_criteria: dict with actions list — the expected sequence of tool calls (used as reference)

Conversion to EvalRecord: - problem : user_scenario.instructions.task_instructions - reference: string representation of evaluation_criteria.actions, truncated to 2000 chars

Classes

ToolOrchestraDataset

ToolOrchestraDataset()

Bases: DatasetProvider

ToolScale (nvidia/ToolScale) external corpus for LLM-guided spec search.

Published as part of the ToolOrchestra paper. dataset_id is kept as "toolorchestra" (the paper's name).

Source code in src/openjarvis/evals/datasets/toolorchestra.py
def __init__(self) -> None:
    self._records: List[EvalRecord] = []

Functions