livecodebench
livecodebench
¶
LiveCodeBench scorer — sandboxed code execution with test cases.
Extracts code from model output, runs it against test cases in a sandboxed subprocess with timeout and resource limits, and scores based on pass/fail of each test case.
Reference: https://livecodebench.github.io/
Classes¶
LiveCodeBenchScorer
¶
Bases: Scorer
Score LiveCodeBench problems by running code against test cases.
Executes model-generated code in a sandboxed subprocess with stdin/stdout test cases. Each test case is run independently with a timeout.