Skip to content

loghub

loghub

LogHub log anomaly detection dataset.

Supports HDFS, BGL, and Thunderbird log datasets from https://github.com/logpai/loghub for evaluating log analysis agents.

Classes

LogHubDataset

LogHubDataset(subset: str = 'hdfs', cache_dir: Optional[str] = None)

Bases: DatasetProvider

LogHub log anomaly detection benchmark.

Source code in src/openjarvis/evals/datasets/loghub.py
def __init__(
    self,
    subset: str = "hdfs",
    cache_dir: Optional[str] = None,
) -> None:
    if subset not in _DATASETS:
        raise ValueError(
            f"Unknown LogHub subset: {subset}. "
            f"Choose from: {list(_DATASETS.keys())}"
        )
    self._subset = subset
    self._cache_dir = (
        Path(cache_dir) if cache_dir
        else Path.home() / ".cache" / "loghub"
    )
    self._records: List[EvalRecord] = []