swefficiency_structural
swefficiency_structural
¶
SWEfficiency scorer — structural patch validation.
Full SWEfficiency evaluation requires running performance benchmarks inside the repository environment. This scorer performs lightweight structural checks on the model output (e.g. whether it looks like a valid patch) and defers the authoritative pass/fail to external benchmark execution.
Classes¶
SWEfficiencyScorer
¶
Bases: Scorer
Structural validation scorer for SWEfficiency patches.
Since true SWEfficiency scoring requires running performance
benchmarks in a sandboxed repository checkout, this scorer only
checks whether the model produced something that looks like a valid
unified diff. The is_correct field is set to None
(indeterminate) when a patch-like response is detected — downstream
harnesses should measure the actual speedup.