LiveCodeBench scorer — sandboxed code execution with test cases.
Extracts code from model output, runs it against test cases in a
sandboxed subprocess with timeout and resource limits, and scores
based on pass/fail of each test case.
Reference: https://livecodebench.github.io/
Classes
LiveCodeBenchScorer
LiveCodeBenchScorer(judge_backend=None, judge_model: str = '')
Bases: Scorer
Score LiveCodeBench problems by running code against test cases.
Executes model-generated code in a sandboxed subprocess with stdin/stdout
test cases. Each test case is run independently with a timeout.
Source code in src/openjarvis/evals/scorers/livecodebench.py
| def __init__(self, judge_backend=None, judge_model: str = "") -> None:
# Accept same constructor args as LLMJudgeScorer for CLI compatibility
# but test execution does not need an LLM judge
pass
|