skillorchestra
skillorchestra
¶
SkillOrchestraAgent — inference-time router (Wang et al., 2026).
Paper: arXiv:2602.19672. The published pipeline is four phases — explore (run every pool model, collect traces), learn (induce a skill handbook with per-agent Beta competences and per-skill cost stats), select (Pareto- optimal handbook subset on a live val set), test. At deployment, the orchestrator reads the user query, infers skill demands, then picks the agent that maximizes weighted competence minus λ·cost.
What we reproduce here: the deployment-time step only. The full explore/learn/select pipeline requires multi-model serving + the FRAMES wiki retriever + a multi-hour LLM-driven learning loop that's out of scope for the OpenJarvis port (and was out of scope in the hybrid harness).
So this agent uses the orchestrator's inference logic with a small handbook that's synthesized per-task on the fly: cloud (Opus) reads the question, identifies which skills it needs (from a fixed catalog), assigns weights, scores each of our two agents (local Qwen-27B vs cloud Opus) under a cost-discounted weighted-competence rule, then routes. The chosen agent answers the question.
Hybrid harness result (n=30 GAIA): skillorchestra-gaia-qwen27b-opus-30
= 0.500 acc, $0.02/task — 30× cheaper than baseline-cloud (0.567 / $0.66)
for ~7pp lower accuracy. Best cost-efficient GAIA paradigm by a wide
margin.
Ported from hybrid-local-cloud-compute/adapters/skillorchestra_adapter.py.
Classes¶
SkillOrchestraAgent
¶
SkillOrchestraAgent(engine: InferenceEngine, model: str, *, local_model: Optional[str] = None, local_endpoint: Optional[str] = None, cloud_endpoint: str = 'anthropic', cfg: Optional[Dict[str, Any]] = None, bus: Optional[Any] = None, temperature: Optional[float] = None, max_tokens: Optional[int] = None)
Bases: LocalCloudAgent
Inference-time skill-aware router. See module docstring.