Skip to content

liveresearchbench

liveresearchbench

LiveResearchBench (Salesforce) dataset provider.

80 expert-curated deep research tasks with per-question evaluation checklists across three domains: daily life, enterprise, and academia. 543 checklist items total (grouped by question).

Reference: https://github.com/SalesforceAIResearch/LiveResearchBench HuggingFace: Salesforce/LiveResearchBench (gated — accept terms first)

Classes

LiveResearchBenchSFDataset

LiveResearchBenchSFDataset()

Bases: DatasetProvider

Salesforce LiveResearchBench — 80 expert-curated research tasks.

The HuggingFace dataset has 543 rows (multiple checklist items per question). We group by qid to produce one EvalRecord per unique question, with all checklist items aggregated in metadata.

Source code in src/openjarvis/evals/datasets/liveresearchbench.py
def __init__(self) -> None:
    self._records: Optional[List[EvalRecord]] = None