webchorearena_scorer
webchorearena_scorer
¶
Scorer for WebChoreArena web chore tasks.
Uses the environment-validated scoring pattern (same as WorkArenaScorer):
the WebChoreArenaTaskEnv runs the original WebArena evaluation harness
(StringEvaluator × URLEvaluator × HTMLContentEvaluator, multiplicative)
and populates record.metadata["is_resolved"] and record.metadata["reward"].
This scorer reads those fields.
The evaluation harness inside the task env faithfully mirrors the original: - StringEvaluator: exact_match, must_include (with |OR| support), fuzzy_match (LLM-judged via GPT-4o), ua_match (unachievable task detection) - URLEvaluator: checks browser's current page URL against reference URLs - HTMLContentEvaluator: navigates to URLs, runs JS locators on DOM, checks element content against expected values
Classes¶
WebChoreArenaScorer
¶
Bases: Scorer
Environment-validated scorer for WebChoreArena tasks.
Reads is_resolved and reward from record.metadata,
populated by WebChoreArenaTaskEnv._run_evaluation() which runs
the original WebArena evaluation harness against the live browser state.