mini_swe_agent
mini_swe_agent
¶
MiniSWEAgent — vendored, ~330-line port of mini-SWE-agent v2.
Single-LLM agent loop with a bash tool, run inside a per-task git
clone. The model iterates: read files, grep, run tests, edit, retry —
the environment-interaction loop that turns SWE-bench from "predict the
patch blind" (~0.30) into "actually fix the bug" (~0.77 for frontier
models).
Two ways to use this module:
- Standalone agent — :class:
MiniSWEAgentregistered asmini_swe_agent. Use it directly as the agent for a cell. - As a worker subroutine inside another paradigm — call
:func:
run_swe_agent_loop(task, ...). Returns a dict with the final patch, token totals, cost, etc. This is how Minions / Conductor / Advisors / SkillOrchestra / ToolOrchestra / Archon swap their one-shot worker call for a real agent loop when running SWE-bench.
Differences vs. the upstream (https://github.com/swe-agent/mini-swe-agent):
- No Docker sandbox. We clone the SWE-bench repo into a tempdir and
exec bash there. Network is available (pip etc.). Treat outputs as
untrusted — model can run
rm -rfagainst its own workdir, but the workdir is disposable. Don't run this on a host with secrets in the CWD. - One tool,
bash. No separatesubmit— the loop ends when the model produces a turn with no tool calls. We extract the patch fromgit diffin the workdir at that point. - Trace events captured via the LocalCloudAgent thread-local trace
buffer so every bash invocation + result lands in
experiments/<cell>/logs/<task_id>.json.
Classes¶
MiniSWEAgent
¶
MiniSWEAgent(engine: InferenceEngine, model: str, *, local_model: Optional[str] = None, local_endpoint: Optional[str] = None, cloud_endpoint: str = 'anthropic', cfg: Optional[Dict[str, Any]] = None, bus: Optional[Any] = None, temperature: Optional[float] = None, max_tokens: Optional[int] = None)
Bases: LocalCloudAgent
Single-model bash-loop agent for SWE-bench-shaped tasks.
Configurable knobs via cfg:
backbone(str, default"cloud"):"cloud"or"local".max_turns(int, default 30): hard cap on tool turns.bash_timeout_s(int, default 120): per-command timeout.output_cap(int, default 10_000): per-command stdout/stderr cap.turn_max_tokens(int, default 4096): max_tokens per LLM turn.
Source code in src/openjarvis/agents/hybrid/_base.py
Functions¶
run_swe_agent_loop
¶
run_swe_agent_loop(task: Dict[str, Any], *, backbone: str, backbone_model: str, cloud_endpoint: str = 'anthropic', local_endpoint: Optional[str] = None, initial_prompt: Optional[str] = None, max_turns: int = 30, bash_timeout: int = 120, bash_timeout_s: Optional[int] = None, output_cap: int = 10000, turn_max_tokens: int = 4096, trace_prefix: str = 'mini_swe', workdir: Optional[Path] = None, compact_at_tokens: int = 24000, compact_keep_last: int = 4) -> Dict[str, Any]
Run a mini-SWE-agent loop for one SWE-bench task. Returns:
.. code-block:: python
{
"answer": str, # final framed answer with ```diff fence
"patch": str, # raw unified diff from git diff
"final_summary": str, # the no-tool-call assistant text (may be empty)
"tokens_in": int,
"tokens_out": int,
"tokens_local": int, # bookkeeping split for paradigms
"tokens_cloud": int,
"cost_usd": float,
"turns": int,
"max_turns_hit": bool,
"workdir": str,
}
Captures every bash invocation + LLM turn into the active trace buffer
via :func:_record_event from the LocalCloudAgent base, so callers
don't have to do their own per-call instrumentation.
Args:
task: SWE-bench-shaped dict with repo + base_commit + task_id
+ (optional) problem_statement / hints_text.
backbone: "cloud" to drive the loop with the cloud model
(Anthropic only today), "local" for vLLM.
backbone_model: model id for the loop's backbone.
cloud_endpoint / local_endpoint: SDK targets.
initial_prompt: if set, used as the first user message (paradigms
embed orchestrator context in here). If None, falls back to the
task's problem_statement.
workdir: pre-cloned repo path. If None, this function clones the
repo into a tempdir and cleans it up at the end. Paradigms that
want to chain multiple subloops over the same working tree can
manage their own workdir.
Source code in src/openjarvis/agents/hybrid/mini_swe_agent.py
323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 | |