archon
archon
¶
ArchonAgent — port of ScalingIntelligence/Archon.
Inference-time architecture search: layered (generator → ranker → fuser) sampling where a generator proposes K candidates, a ranker scores them, and a fuser synthesizes a final answer. Paper: arXiv:2409.15254.
How the hybrid harness wires it (and what we mirror here):
- Local proposers (generator layer): K samples from vLLM via an
OpenAI-compatible client at
local_endpoint. Injected as a customvllm_localmodel_type into Archon'sGENERATE_MAP— that's the only way Ranker/Fuser can pick up custom backends (they re-instantiate Generator withoutcustom_generators). - Cloud ranker + fuser: Archon's built-in
OpenAI_API/Anthropic_API. Patched at import time to striptemperaturefor Opus 4.7+ and to tally token usage (Archon ignoresusageby default).
cfg knobs:
n_samples(int, default 5) — K proposers-
architecture(str, default"ensemble_rank_fuse") -
"ensemble_rank_fuse"→ [K local generators, 1 cloud ranker, 1 cloud fuser] -
"single_local"→ [1 local generator] (debug) -
ranker_model/fuser_model(default:cloud_modelfor both) max_tokens(default 2048),temperature(default 0.7)
Requires the Archon library (cloned at
hybrid-local-cloud-compute/external/Archon — add its src to
PYTHONPATH or pip-install editable). Import is lazy.
Ported from hybrid-local-cloud-compute/adapters/archon_adapter.py.
Classes¶
ArchonAgent
¶
ArchonAgent(engine: InferenceEngine, model: str, *, local_model: Optional[str] = None, local_endpoint: Optional[str] = None, cloud_endpoint: str = 'anthropic', cfg: Optional[Dict[str, Any]] = None, bus: Optional[Any] = None, temperature: Optional[float] = None, max_tokens: Optional[int] = None)
Bases: LocalCloudAgent
Layered (generator → ranker → fuser) inference-time search.