SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Black-Box Machine-Generated Text Detection

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/11002157

下载链接

链接失效反馈

官方服务：

资源简介：

Large language models (LLMs) are becoming mainstream and easily accessible, ushering in an explosion of machine-generated content over various channels, such as news, social media, question-answering forums, educational, and even academic contexts. Recent LLMs, such as ChatGPT and GPT-4, generate remarkably fluent responses to a wide variety of user queries. The articulate nature of such generated texts makes LLMs attractive for replacing human labor in many scenarios. However, this has also resulted in concerns regarding their potential misuse, such as spreading misinformation and causing disruptions in the education system. Since humans perform only slightly better than chance when classifying machine-generated vs. human-written text, there is a need to develop automatic systems to identify machine-generated text with the goal of mitigating its potential misuse. We offer three subtasks over two paradigms of text generation: (1) full text when a considered text is entirely written by a human or generated by a machine; and (2) mixed text when a machine-generated text is refined by a human or a human-written text paraphrased by a machine.

创建时间：

2024-05-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集