toolorchestra
toolorchestra
¶
ToolOrchestraAgent — prompted port of NVlabs ToolOrchestra (arXiv:2511.21689).
The paper RL-trains an 8B Orchestrator (nvidia/Nemotron-Orchestrator-8B)
to coordinate basic tools + specialist LLMs + generalist LLMs in
multi-turn agentic loops, ranked #1 on GAIA at release.
The hybrid harness adapter for ToolOrchestra is a documented stub — running the real thing needs a separate vLLM srun for the Orchestrator-8B checkpoint, a FAISS wiki retriever, a Tavily API key, and a refactor of the upstream eval scripts. None of that fits in our cluster allocation.
This port keeps the same scope discipline: inference-time only,
prompted, no RL. A cloud model plays the role of the orchestrator,
dispatching to a pool of (tool | specialist_llm | generalist_llm)
workers in a reactive loop. The loop is the paradigm; the orchestrator
weights are not.
Why ship this at all if it's not the "real" thing? Because the prompted upper-bound is useful as a reference point alongside the other paradigms, and because the OpenJarvis registry needs all six entries for the distillation pipeline to slot ToolOrchestra in alongside the rest.
Pipeline per task:
- Orchestrator (cloud) reads question + numbered worker pool.
- Each turn it emits
{"action": "call_worker", "worker_id": int, "input": str}or{"action": "final_answer", "answer": str}. - Up to
max_turns(default 6) calls before forcing a final-answer prompt; fallback to strongest worker on parse failure.
Workers come from cfg["workers"] or a sensible default pool (local
Qwen if vLLM up, plus a web-search tool via Anthropic, Opus 4.7,
gpt-5-mini).
Not yet validated end-to-end in the hybrid harness — the hybrid adapter
raise NotImplementedErrors. Treat results from this paradigm as
preliminary until we have a real ToolOrchestra-8B deployment.
Classes¶
ToolOrchestraAgent
¶
ToolOrchestraAgent(engine: InferenceEngine, model: str, *, local_model: Optional[str] = None, local_endpoint: Optional[str] = None, cloud_endpoint: str = 'anthropic', cfg: Optional[Dict[str, Any]] = None, bus: Optional[Any] = None, temperature: Optional[float] = None, max_tokens: Optional[int] = None)
Bases: LocalCloudAgent
Prompted multi-turn dispatcher over a mixed worker pool.
Inference-only port — does NOT use the RL-trained Nemotron-Orchestrator-8B. See module docstring for what's missing relative to the published paper.