LogHub log anomaly detection dataset.
Supports HDFS, BGL, and Thunderbird log datasets from
https://github.com/logpai/loghub for evaluating log analysis agents.
Classes
LogHubDataset
LogHubDataset(subset: str = 'hdfs', cache_dir: Optional[str] = None)
Bases: DatasetProvider
LogHub log anomaly detection benchmark.
Source code in src/openjarvis/evals/datasets/loghub.py
| def __init__(
self,
subset: str = "hdfs",
cache_dir: Optional[str] = None,
) -> None:
if subset not in _DATASETS:
raise ValueError(
f"Unknown LogHub subset: {subset}. "
f"Choose from: {list(_DATASETS.keys())}"
)
self._subset = subset
self._cache_dir = (
Path(cache_dir) if cache_dir
else Path.home() / ".cache" / "loghub"
)
self._records: List[EvalRecord] = []
|