toolorchestra
toolorchestra
¶
ToolOrchestraAgent — port of NVlabs ToolOrchestra (arXiv:2511.21689).
Two modes, gated by method_cfg.orchestrator_mode:
-
"prompted"(default, legacy): a cloud model (Opus etc.) plays the orchestrator, dispatching to a numbered worker pool via JSON{"action": "call_worker"|"final_answer", ...}actions. Useful as a prompted upper-bound reference point — NOT the paper's setup. -
"rl"(paper-faithful): the RL-trainednvidia/Orchestrator-8Bserved on a local vLLM is the orchestrator. It emits OpenAI-styletool_calls(or<tool_call>{...}</tool_call>text blocks when vLLM's tool parser doesn't catch them) for three expert tools —enhance_reasoning,answer,search— exactly as in the upstreamevaluation/tools.json. Each tool'smodelarg (answer-1,reasoner-2,search-3, …) is mapped to a real backend throughEXPERT_MODEL_MAPPING— by default the frontier Anthropic worker for*-1slots, gpt-5-mini for*-2, local Qwen for*-3. Search routes to the configured provider's server-side web-search helper when available.
We do NOT reproduce the upstream Tavily / FAISS-wiki retriever, the code-interpreter sandbox, or the multi-vLLM mix (Llama-3.3-70B, Qwen-Math, Qwen-Coder); the expert pool collapses onto our existing worker types. Energy-wise, "expert" answers are cloud calls.
Pipeline per task (RL mode):
- Orchestrator-8B reads
Problem: ...\n\n{context}\n\nChoose an appropriate tool.with the three tools declared. - It emits one
tool_callper turn —searchupdates the context,enhance_reasoningappends code/exec output (we run the tool as a plain LLM call, no sandbox — the model just gets prose back),answerproduces the final answer and the loop stops. - Up to
max_turns(default 8) turns; on parse failure we fall back to the strongest expert worker.
Prompted-mode pipeline:
- Orchestrator (cloud) reads question + numbered worker pool.
- Each turn it emits
{"action": "call_worker", "worker_id": int, "input": str}or{"action": "final_answer", "answer": str}. - Up to
max_turns(default 6) calls before forcing a final-answer prompt; fallback to strongest worker on parse failure.
Workers come from cfg["workers"] or a sensible default pool (local
Qwen if vLLM up, plus provider-native web search, the configured frontier
cloud model, and gpt-5-mini).
Classes¶
ToolOrchestraAgent
¶
Bases: LocalCloudAgent
Multi-turn dispatcher over a mixed worker pool.
Two modes (see module docstring): method_cfg.orchestrator_mode
is "prompted" (default, cloud-as-orchestrator) or "rl"
(paper-faithful, drives nvidia/Orchestrator-8B on a local vLLM).