Skip to content

toolorchestra

toolorchestra

ToolOrchestraAgent — port of NVlabs ToolOrchestra (arXiv:2511.21689).

Two modes, gated by method_cfg.orchestrator_mode:

  • "prompted" (default, legacy): a cloud model (Opus etc.) plays the orchestrator, dispatching to a numbered worker pool via JSON {"action": "call_worker"|"final_answer", ...} actions. Useful as a prompted upper-bound reference point — NOT the paper's setup.

  • "rl" (paper-faithful): the RL-trained nvidia/Orchestrator-8B served on a local vLLM is the orchestrator. It emits OpenAI-style tool_calls (or <tool_call>{...}</tool_call> text blocks when vLLM's tool parser doesn't catch them) for three expert tools — enhance_reasoning, answer, search — exactly as in the upstream evaluation/tools.json. Each tool's model arg (answer-1, reasoner-2, search-3, …) is mapped to a real backend through EXPERT_MODEL_MAPPING — by default the frontier Anthropic worker for *-1 slots, gpt-5-mini for *-2, local Qwen for *-3. Search routes to the configured provider's server-side web-search helper when available.

We do NOT reproduce the upstream Tavily / FAISS-wiki retriever, the code-interpreter sandbox, or the multi-vLLM mix (Llama-3.3-70B, Qwen-Math, Qwen-Coder); the expert pool collapses onto our existing worker types. Energy-wise, "expert" answers are cloud calls.

Pipeline per task (RL mode):

  1. Orchestrator-8B reads Problem: ...\n\n{context}\n\nChoose an appropriate tool. with the three tools declared.
  2. It emits one tool_call per turn — search updates the context, enhance_reasoning appends code/exec output (we run the tool as a plain LLM call, no sandbox — the model just gets prose back), answer produces the final answer and the loop stops.
  3. Up to max_turns (default 8) turns; on parse failure we fall back to the strongest expert worker.

Prompted-mode pipeline:

  1. Orchestrator (cloud) reads question + numbered worker pool.
  2. Each turn it emits {"action": "call_worker", "worker_id": int, "input": str} or {"action": "final_answer", "answer": str}.
  3. Up to max_turns (default 6) calls before forcing a final-answer prompt; fallback to strongest worker on parse failure.

Workers come from cfg["workers"] or a sensible default pool (local Qwen if vLLM up, plus provider-native web search, the configured frontier cloud model, and gpt-5-mini).

Classes

ToolOrchestraAgent

ToolOrchestraAgent(*args: Any, **kwargs: Any)

Bases: LocalCloudAgent

Multi-turn dispatcher over a mixed worker pool.

Two modes (see module docstring): method_cfg.orchestrator_mode is "prompted" (default, cloud-as-orchestrator) or "rl" (paper-faithful, drives nvidia/Orchestrator-8B on a local vLLM).

Source code in src/openjarvis/agents/hybrid/toolorchestra.py
def __init__(self, *args: Any, **kwargs: Any) -> None:
    super().__init__(*args, **kwargs)
    # Validate `method_cfg.worker_pool` early — surfaces config errors
    # at agent construction rather than on the first task. No-op when
    # the override is absent.
    if self._cfg.get("worker_pool") is not None:
        _resolve_worker_pool(
            self._cfg,
            self._local_model,
            self._local_endpoint,
            self._cloud_model,
            self._cloud_endpoint,
        )
    # Validate `orchestrator_mode` (typo-checked here, not on first task).
    mode = str(self._cfg.get("orchestrator_mode", "prompted")).lower()
    if mode not in ("prompted", "rl"):
        raise ValueError(
            f"toolorchestra: orchestrator_mode must be 'prompted' or 'rl'; "
            f"got {mode!r}"
        )

Functions