Skip to content

baseline_cloud

baseline_cloud

BaselineCloudAgent — cloud-only reference for the hybrid ablation.

Used as the "what does the cloud do alone?" row in the n=100 ablation matrix (see .openjarvis/experiments/hybrid/docs/results-table.md). No local model is involved — local_* settings are ignored.

On GAIA the agent makes one cloud call with the formatted prompt (which already carries the FINAL ANSWER: format reminder from _prompts.format_gaia) and returns the text. On SWE-bench-Verified the agent delegates to :func:run_swe_agent_loop with backbone="cloud" so the model gets to run bash and read the repo — same wiring as the mini-swe-agent-swebenchverified-opus-* cells. As of 2026-05-15 _loop_cloud dispatches to per-endpoint loops for Anthropic, OpenAI, and Gemini so all three cloud backbones get the proper bash-agent loop on SWE (previously OpenAI / Gemini SWE cells silently fell back to a one-shot blind patch — fixed).

Construction args mirror :class:LocalCloudAgent. The cloud block in the cell registry determines the cloud model + endpoint; local is accepted for schema compatibility but unused.

Classes

BaselineCloudAgent

BaselineCloudAgent(engine: InferenceEngine, model: str, *, local_model: Optional[str] = None, local_endpoint: Optional[str] = None, cloud_endpoint: str = 'anthropic', cfg: Optional[Dict[str, Any]] = None, bus: Optional[Any] = None, temperature: Optional[float] = None, max_tokens: Optional[int] = None)

Bases: LocalCloudAgent

Cloud-only baseline used as a reference in the n=100 ablation.

Configurable knobs via cfg:

  • cloud_max_tokens (int, default 4096 / 16384 for reasoning models): max_tokens per GAIA call and per turn of the SWE agent loop. Default jumps to 16384 for GPT-5 family and Gemini 2.5 Pro because those models burn the budget on hidden chain-of-thought before emitting visible answer text; at 4096 they silently truncated 18–26% of GAIA cells with empty answers. Override per-cell via method_cfg to opt out.
  • swe_max_turns (int, default 30): SWE-bench loop turn cap.
  • swe_bash_timeout_s (int, default 120): SWE-bench bash timeout.
Source code in src/openjarvis/agents/hybrid/_base.py
def __init__(
    self,
    engine: InferenceEngine,
    model: str,
    *,
    local_model: Optional[str] = None,
    local_endpoint: Optional[str] = None,
    cloud_endpoint: str = "anthropic",
    cfg: Optional[Dict[str, Any]] = None,
    bus: Optional[Any] = None,
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
) -> None:
    super().__init__(
        engine,
        model,
        bus=bus,
        temperature=temperature,
        max_tokens=max_tokens,
    )
    self._cloud_model = model
    self._cloud_endpoint = (cloud_endpoint or "anthropic").lower()
    self._local_model = local_model
    self._local_endpoint = local_endpoint
    self._cfg: Dict[str, Any] = dict(cfg or {})

Functions