Skip to content

deepplanning

deepplanning

DeepPlanning: long-horizon planning with constraints.

Evaluates agents on complex shopping tasks with hard constraints (product attributes, ratings, stock, shipping). Source: https://huggingface.co/datasets/Qwen/DeepPlanning Paper: arXiv:2601.18137

The dataset contains shopping planning tasks at 3 difficulty levels (120 total cases). Each case has a natural-language query, product catalog, and ground-truth product selections with constraint metadata.

Classes

DeepPlanningDataset

DeepPlanningDataset(cache_dir: Optional[str] = None)

Bases: DatasetProvider

DeepPlanning long-horizon planning benchmark.

Extracts shopping planning tasks from Qwen/DeepPlanning tar.gz archives. Each case has a query with constraints and ground-truth product selections.

Source code in src/openjarvis/evals/datasets/deepplanning.py
def __init__(
    self,
    cache_dir: Optional[str] = None,
) -> None:
    self._cache_dir = cache_dir
    self._records: List[EvalRecord] = []