livecodebench

livecodebench ¶

LiveCodeBench scorer — sandboxed code execution with test cases.

Extracts code from model output, runs it against test cases in a sandboxed subprocess with timeout and resource limits, and scores based on pass/fail of each test case.

Reference: https://livecodebench.github.io/

Classes¶

LiveCodeBenchScorer ¶

LiveCodeBenchScorer(judge_backend=None, judge_model: str = '')

Bases: Scorer

Score LiveCodeBench problems by running code against test cases.

Executes model-generated code in a sandboxed subprocess with stdin/stdout test cases. Each test case is run independently with a timeout.

Source code in src/openjarvis/evals/scorers/livecodebench.py

def __init__(self, judge_backend=None, judge_model: str = "") -> None:
    # Accept same constructor args as LLMJudgeScorer for CLI compatibility
    # but test execution does not need an LLM judge
    pass