paperarena
paperarena
¶
PaperArena: scientific literature reasoning benchmark.
Evaluates agents on research paper comprehension with three question types: MC (multiple choice), CA (closed answer), OA (open answer) across easy/medium/hard difficulty. Source: https://github.com/Melmaphother/PaperArena Paper: arXiv:2510.10909
Classes¶
PaperArenaDataset
¶
Bases: DatasetProvider
PaperArena scientific literature reasoning benchmark.
Three question types (MC, CA, OA) across three difficulty levels.