mini_swe_agent
mini_swe_agent
¶
MiniSWEAgent — vendored, ~330-line port of mini-SWE-agent v2.
Single-LLM agent loop with a bash tool, run inside a per-task git
clone. The model iterates: read files, grep, run tests, edit, retry —
the environment-interaction loop that turns SWE-bench from "predict the
patch blind" (~0.30) into "actually fix the bug" (~0.77 for frontier
models).
Two ways to use this module:
- Standalone agent — :class:
MiniSWEAgentregistered asmini_swe_agent. Use it directly as the agent for a cell. - As a worker subroutine inside another paradigm — call
:func:
run_swe_agent_loop(task, ...). Returns a dict with the final patch, token totals, cost, etc. This is how Minions / Conductor / Advisors / SkillOrchestra / ToolOrchestra / Archon swap their one-shot worker call for a real agent loop when running SWE-bench.
Differences vs. the upstream (https://github.com/swe-agent/mini-swe-agent):
- No Docker sandbox. We clone the SWE-bench repo into a tempdir and
exec bash there. Network is available (pip etc.). Treat outputs as
untrusted — model can run
rm -rfagainst its own workdir, but the workdir is disposable. Don't run this on a host with secrets in the CWD. - One tool,
bash. No separatesubmit— the loop ends when the model produces a turn with no tool calls. We extract the patch fromgit diffin the workdir at that point. - Trace events captured via the LocalCloudAgent thread-local trace
buffer so every bash invocation + result lands in
experiments/<cell>/logs/<task_id>.json.
Classes¶
MiniSWEAgent
¶
MiniSWEAgent(engine: InferenceEngine, model: str, *, local_model: Optional[str] = None, local_endpoint: Optional[str] = None, cloud_endpoint: str = 'anthropic', cfg: Optional[Dict[str, Any]] = None, bus: Optional[Any] = None, temperature: Optional[float] = None, max_tokens: Optional[int] = None)
Bases: LocalCloudAgent
Single-model bash-loop agent for SWE-bench-shaped tasks.
Configurable knobs via cfg:
backbone(str, default"cloud"):"cloud"or"local".max_turns(int, default 30): hard cap on tool turns.bash_timeout_s(int, default 120): per-command timeout.output_cap(int, default 10_000): per-command stdout/stderr cap.turn_max_tokens(int, default 4096): max_tokens per LLM turn.
Source code in src/openjarvis/agents/hybrid/_base.py
Functions¶
run_swe_agent_loop
¶
run_swe_agent_loop(task: Dict[str, Any], *, backbone: str, backbone_model: str, cloud_endpoint: str = 'anthropic', local_endpoint: Optional[str] = None, initial_prompt: Optional[str] = None, max_turns: int = 30, bash_timeout: int = 120, output_cap: int = 10000, turn_max_tokens: int = 4096, trace_prefix: str = 'mini_swe', workdir: Optional[Path] = None) -> Dict[str, Any]
Run a mini-SWE-agent loop for one SWE-bench task. Returns:
.. code-block:: python
{
"answer": str, # final framed answer with ```diff fence
"patch": str, # raw unified diff from git diff
"final_summary": str, # the no-tool-call assistant text (may be empty)
"tokens_in": int,
"tokens_out": int,
"tokens_local": int, # bookkeeping split for paradigms
"tokens_cloud": int,
"cost_usd": float,
"turns": int,
"max_turns_hit": bool,
"workdir": str,
}
Captures every bash invocation + LLM turn into the active trace buffer
via :func:_record_event from the LocalCloudAgent base, so callers
don't have to do their own per-call instrumentation.
Args:
task: SWE-bench-shaped dict with repo + base_commit + task_id
+ (optional) problem_statement / hints_text.
backbone: "cloud" to drive the loop with the cloud model
(Anthropic only today), "local" for vLLM.
backbone_model: model id for the loop's backbone.
cloud_endpoint / local_endpoint: SDK targets.
initial_prompt: if set, used as the first user message (paradigms
embed orchestrator context in here). If None, falls back to the
task's problem_statement.
workdir: pre-cloned repo path. If None, this function clones the
repo into a tempdir and cleans it up at the end. Paradigms that
want to chain multiple subloops over the same working tree can
manage their own workdir.
Source code in src/openjarvis/agents/hybrid/mini_swe_agent.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 | |