swefficiency_structural

swefficiency_structural ¶

SWEfficiency scorer — structural patch validation.

Full SWEfficiency evaluation requires running performance benchmarks inside the repository environment. This scorer performs lightweight structural checks on the model output (e.g. whether it looks like a valid patch) and defers the authoritative pass/fail to external benchmark execution.

Classes¶

SWEfficiencyScorer ¶

SWEfficiencyScorer(judge_backend: object = None, judge_model: str = '')

Bases: Scorer

Structural validation scorer for SWEfficiency patches.

Since true SWEfficiency scoring requires running performance benchmarks in a sandboxed repository checkout, this scorer only checks whether the model produced something that looks like a valid unified diff. The is_correct field is set to None (indeterminate) when a patch-like response is detected — downstream harnesses should measure the actual speedup.

Source code in src/openjarvis/evals/scorers/swefficiency_structural.py

def __init__(
    self,
    judge_backend: object = None,
    judge_model: str = "",
) -> None:
    # Accept judge_backend/judge_model so the CLI factory pattern works,
    # but they are unused — scoring is purely structural.
    self._judge_backend = judge_backend
    self._judge_model = judge_model