baseline_cloud
baseline_cloud
¶
BaselineCloudAgent — cloud-only reference for the hybrid ablation.
Used as the "what does the cloud do alone?" row in the n=100 ablation
matrix (see .openjarvis/experiments/hybrid/docs/results-table.md).
No local model is involved — local_* settings are ignored.
On GAIA the agent makes one cloud call with the formatted prompt (which
already carries the FINAL ANSWER: format reminder from
_prompts.format_gaia) and returns the text. On SWE-bench-Verified
the agent delegates to :func:run_swe_agent_loop with backbone="cloud"
so the model gets to run bash and read the repo — same wiring as the
mini-swe-agent-swebenchverified-opus-* cells. As of 2026-05-15
_loop_cloud dispatches to per-endpoint loops for Anthropic, OpenAI,
and Gemini so all three cloud backbones get the proper bash-agent loop
on SWE (previously OpenAI / Gemini SWE cells silently fell back to a
one-shot blind patch — fixed).
Construction args mirror :class:LocalCloudAgent. The cloud block
in the cell registry determines the cloud model + endpoint; local
is accepted for schema compatibility but unused.
Classes¶
BaselineCloudAgent
¶
BaselineCloudAgent(engine: InferenceEngine, model: str, *, local_model: Optional[str] = None, local_endpoint: Optional[str] = None, cloud_endpoint: str = 'anthropic', cfg: Optional[Dict[str, Any]] = None, bus: Optional[Any] = None, temperature: Optional[float] = None, max_tokens: Optional[int] = None)
Bases: LocalCloudAgent
Cloud-only baseline used as a reference in the n=100 ablation.
Configurable knobs via cfg:
cloud_max_tokens(int, default 4096 / 16384 for reasoning models): max_tokens per GAIA call and per turn of the SWE agent loop. Default jumps to 16384 for GPT-5 family and Gemini 2.5 Pro because those models burn the budget on hidden chain-of-thought before emitting visible answer text; at 4096 they silently truncated 18–26% of GAIA cells with empty answers. Override per-cell viamethod_cfgto opt out.swe_max_turns(int, default 30): SWE-bench loop turn cap.swe_bash_timeout_s(int, default 120): SWE-bench bash timeout.