shroom
收藏魔搭社区2025-12-05 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/Helsinki-NLP/shroom
下载链接
链接失效反馈官方服务:
资源简介:
# The **SHROOM** dataset for Hallucination and Overgeneration detection.
SHROOM: Shared-task on Hallucinations and Related Observable Overgeneration Mistakes and Related Observable Overgeneration Mistakes
## Dataset Description
**disclaimer**: SHROOM is not properly a fact-checking dataset, but we mark is as such until `hallucination detection` (or something more adequate) is added to the offical list of task_ids.
### Features
## Dataset Structure
### Data Fields
### Data Splits
## How to Use
### Loading the Dataset
### Example Usage
# Shared Task Information: Quick Overview
# Citation
If you use this dataset, please cite the SemEval-2024 task proceedings:
```bib
@inproceedings{mickus-etal-2024-semeval,
title = "{S}em{E}val-2024 Task 6: {SHROOM}, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes",
author = {Mickus, Timothee and
Zosa, Elaine and
Vazquez, Raul and
Vahtola, Teemu and
Tiedemann, J{\"o}rg and
Segonne, Vincent and
Raganato, Alessandro and
Apidianaki, Marianna},
editor = {Ojha, Atul Kr. and
Do{\u{g}}ru{\"o}z, A. Seza and
Tayyar Madabushi, Harish and
Da San Martino, Giovanni and
Rosenthal, Sara and
Ros{\'a}, Aiala},
booktitle = "Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.semeval-1.273/",
doi = "10.18653/v1/2024.semeval-1.273",
pages = "1979--1993",
abstract = "This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate. Such cases of overgeneration put in jeopardy many NLG applications, where correctness is often mission-critical. The shared task was conducted with a newly constructed dataset of 4000 model outputs labeled by 5 annotators each, spanning 3 NLP tasks: machine translation, paraphrase generation and definition modeling.The shared task was tackled by a total of 58 different users grouped in 42 teams, out of which 26 elected to write a system description paper; collectively, they submitted over 300 prediction sets on both tracks of the shared task. We observe a number of key trends in how this approach was tackled{---}many participants rely on a handful of model, and often rely either on synthetic data for fine-tuning or zero-shot prompting strategies. While a majority of the teams did outperform our proposed baseline system, the performances of top-scoring systems are still consistent with a random handling of the more challenging items."
}
```
## Contact
For questions about the dataset, please contact the organizers:
- Raúl Vázquez (University of Helsinki)
- Timothee Mickus (University of Helsinki)
## 👥🙌🌐 Join the SHROOM Community
Whether you're interested in joining the next round, learning from past editions, or just staying informed about hallucination detection in NLG, we'd love to have you in the community.
- Check out the [**\*SHRO0M** shared task series](https://helsinki-nlp.github.io/shroom/)
- Join the conversation on [Slack](https://join.slack.com/t/shroom-shared-task/shared_invite/zt-2mmn4i8h2-HvRBdK5f4550YHydj5lpnA)
- Check out the past editions Google groups
- [Mu-SHROOM 2025](https://groups.google.com/g/semeval-2025-task-3-mu-shroom)
- [Mu-SHROOM 2024](https://groups.google.com/g/semeval-2024-task-6-shroom)
# 用于幻觉与过度生成检测的**SHROOM**数据集。
SHROOM:幻觉与相关可观测过度生成错误共享任务(Shared-task on Hallucinations and Related Observable Overgeneration Mistakes)
## 数据集说明
**免责声明**:SHROOM暂未被正式归类为事实核查数据集,我们将其暂标记为此类别,直至"幻觉检测"(或其他更适配的任务类型)被纳入官方任务ID列表。
### 数据集特征
### 数据集结构
### 数据字段
### 数据划分
## 使用方法
### 数据集加载
### 示例用法
# 共享任务信息:快速概览
# 引用
若使用本数据集,请引用SemEval-2024任务相关论文集:
bib
@inproceedings{mickus-etal-2024-semeval,
title = "SemEval-2024任务6:SHROOM——幻觉与相关可观测过度生成错误共享任务",
author = {Mickus, Timothee 及
Zosa, Elaine 及
Vazquez, Raul 及
Vahtola, Teemu 及
Tiedemann, Jörg 及
Segonne, Vincent 及
Raganato, Alessandro 及
Apidianaki, Marianna},
editor = {Ojha, Atul Kr. 及
Doğruöz, A. Seza 及
Tayyar Madabushi, Harish 及
Da San Martino, Giovanni 及
Rosenthal, Sara 及
Rosá, Aiala},
booktitle = "第18届语义评估国际研讨会(SemEval-2024)论文集",
month = jun,
year = "2024",
address = "墨西哥城,墨西哥",
publisher = "计算语言学协会(Association for Computational Linguistics)",
url = "https://aclanthology.org/2024.semeval-1.273/",
doi = "10.18653/v1/2024.semeval-1.273",
pages = "1979--1993",
abstract = "本文介绍了SHROOM共享任务的相关成果,该任务聚焦于幻觉检测:识别自然语言生成(Natural Language Generation, NLG)系统产出的流畅但不准确的输出内容。此类过度生成问题会对众多对正确性有极高要求的NLG应用构成严重威胁。本次共享任务采用了全新构建的数据集,包含4000条模型输出样本,由5名标注员分别标注,覆盖3类自然语言处理任务:机器翻译、释义生成与定义建模。本次共享任务共有来自42支团队的58名参与者参与,其中26支团队提交了系统描述论文;各团队累计在任务的两个赛道上提交了超过300组预测结果。我们观察到该任务的主流解决方案存在若干关键趋势:多数参与者依赖少量基础模型,且常使用微调合成数据或零样本(Zero-shot)提示策略。尽管大多数团队的表现均优于我们提出的基线系统,但顶尖得分系统的性能仍与随机处理高难度样本的表现持平。"
}
## 联系方式
若对本数据集有疑问,请联系主办方:
- 劳尔·巴斯克斯(Raúl Vázquez,赫尔辛基大学)
- 蒂莫泰·米库斯(Timothee Mickus,赫尔辛基大学)
## 👥🙌🌐 加入SHROOM社区
无论您有意参与下一阶段赛事、学习过往赛事内容,或是仅希望了解自然语言生成(NLG)领域的幻觉检测动态,我们都热忱欢迎您加入社区。
- 查看[**SHROOM共享任务系列赛事**](https://helsinki-nlp.github.io/shroom/)
- 加入[Slack社区](https://join.slack.com/t/shroom-shared-task/shared_invite/zt-2mmn4i8h2-HvRBdK5f4550YHydj5lpnA)进行交流
- 查看过往赛事的谷歌群组:
- [Mu-SHROOM 2025](https://groups.google.com/g/semeval-2025-task-3-mu-shroom)
- [Mu-SHROOM 2024](https://groups.google.com/g/semeval-2024-task-6-shroom)
提供机构:
maas
创建时间:
2025-08-16



