LHRS-UM-FERI/MENTHOS-dataset-rootcause
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/LHRS-UM-FERI/MENTHOS-dataset-rootcause
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- sl
tags:
- menthos
- root-cause
- logs
- binary-classification
size_categories:
- 1K<n<10K
---
# MENTHOS-dataset-rootcause
## English
### About
MENTHOS-dataset-rootcause is a binary log classification dataset built for root-cause identification experiments.
### Source Data
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/training-data/root-cause-training-data.csv
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/validation-data/root-cause-validation-data-input.jsonlines
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/training-data/root-cause-unseen-errors.csv
### Processing and Balancing
- Input sources are merged into a single dataframe (`label`, `log`).
- Split into 70% train, 15% validation, 15% test.
- Balancing: each split is balanced by downsampling both classes to the same count.
### Splits and Class Distribution
The prepared train, validation, and test splits are included with the dataset release.
| split | rows | label 0 | label 1 |
| ---------- | ---: | ------: | ------: |
| train | 1266 | 633 | 633 |
| validation | 296 | 148 | 148 |
| test | 296 | 148 | 148 |
### Citation
```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-dataset-rootcause},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/LHRS-UM-FERI/MENTHOS-dataset-rootcause}}
}
```
---
## Slovenščina
### O datasetu
MENTHOS-dataset-rootcause je binarni dataset zapisov za eksperimente zaznavanja koreninskih vzrokov.
### Izvorni podatki
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/training-data/root-cause-training-data.csv
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/validation-data/root-cause-validation-data-input.jsonlines
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/training-data/root-cause-unseen-errors.csv
### Obdelava in uravnoteženje
- Vhodni viri se združijo v enoten format (`label`, `log`).
- Razdelitev 70% train, 15% validation, 15% test.
- Uravnoteženje: vsak split je uravnotežen z downsamplingom obeh razredov.
### Delitve in porazdelitev razredov
Pripravljene train, validation in test delitve so vključene v izdajo nabora podatkov.
| split | vrstic | label 0 | label 1 |
| ---------- | -----: | ------: | ------: |
| train | 1266 | 633 | 633 |
| validation | 296 | 148 | 148 |
| test | 296 | 148 | 148 |
### Citiranje
```
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-dataset-rootcause},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/LHRS-UM-FERI/MENTHOS-dataset-rootcause}}
}
```
---
语言:
- 英语(English)
- 斯洛文尼亚语(Slovenščina)
标签:
- MENTHOS
- 根本原因(root-cause)
- 日志(logs)
- 二分类(binary-classification)
规模类别:
- 1000条 < 样本量 < 10000条
---
# MENTHOS故障根因数据集
## 英语版本
### 数据集概述
本数据集(MENTHOS-dataset-rootcause)是一款专为根因识别实验构建的二分类日志分类数据集。
### 原始数据源
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/training-data/root-cause-training-data.csv
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/validation-data/root-cause-validation-data-input.jsonlines
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/training-data/root-cause-unseen-errors.csv
### 数据处理与均衡
- 将所有输入数据源合并为单一数据框(包含`label`标签与`log`日志两个字段)。
- 将数据集按70%训练集、15%验证集、15%测试集的比例划分。
- 均衡策略:通过对两个类别进行下采样至相同样本量,实现各划分子集的类别均衡。
### 数据集划分与类别分布
本次发布的数据集已包含预处理完成的训练、验证与测试划分子集。
| 数据集划分 | 样本量 | 标签0样本数 | 标签1样本数 |
| ---------- | ---: | ------: | ------: |
| 训练集 | 1266 | 633 | 633 |
| 验证集 | 296 | 148 | 148 |
| 测试集 | 296 | 148 | 148 |
### 引用格式
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-dataset-rootcause},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {url{https://huggingface.co/datasets/LHRS-UM-FERI/MENTHOS-dataset-rootcause}}
}
## 斯洛文尼亚语版本翻译
### 数据集概述
本数据集(MENTHOS-dataset-rootcause)是一款专为根因识别实验构建的二分类日志数据集。
### 原始数据源
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/training-data/root-cause-training-data.csv
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/validation-data/root-cause-validation-data-input.jsonlines
- https://github.com/nv-morpheus/Morpheus/raw/refs/heads/branch-25.10/models/datasets/training-data/root-cause-unseen-errors.csv
### 数据处理与均衡
- 将所有输入数据源合并为单一格式(包含`label`标签与`log`日志字段)。
- 按70%训练集、15%验证集、15%测试集的比例划分数据集。
- 均衡策略:通过对两个类别执行下采样以实现各子集的类别均衡。
### 数据集划分与类别分布
本次发布的数据集已包含预处理完成的训练、验证与测试划分子集。
| 数据集划分 | 样本量 | 标签0样本数 | 标签1样本数 |
| ---------- | ---: | ------: | ------: |
| 训练集 | 1266 | 633 | 633 |
| 验证集 | 296 | 148 | 148 |
| 测试集 | 296 | 148 | 148 |
### 引用格式
@misc{borovic_li-dobnik_kranjec_ferme_2026,
title = {MENTHOS-dataset-rootcause},
author = {Borovic, Li Dobnik, Kranjec, Ferme},
year = {2026},
publisher = {Hugging Face},
howpublished = {url{https://huggingface.co/datasets/LHRS-UM-FERI/MENTHOS-dataset-rootcause}}
}
提供机构:
LHRS-UM-FERI



