saiteja33/BMAS
收藏Hugging Face2025-09-26 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/saiteja33/BMAS
下载链接
链接失效反馈官方服务:
资源简介:
E-BMAS是一个英文语言数据集,用于二分类任务,区分人和机器生成的文本;多分类任务,不仅识别机器生成的文本,还尝试确定其生成器;对抗性攻击任务,研究降低机器生成文本检测性的常见行为;以及句子级别的分割任务,预测人和机器生成文本的边界。数据集包含了来自不同领域的文本,如Reddit、新闻、维基百科、arXiv、问答等,以及不同模型生成的文本,如Deepseek、OpenAI、Anthropic和Llama等。数据集还包含了对抗性攻击的数据,以增强模型的鲁棒性。
E-BMAS is an English language dataset designed for binary classification to distinguish between human and machine-generated text, multiclass classification that not only identifies machine-generated text but also attempts to determine its generator, adversarial attack tasks that study common acts to reduce the detectability of machine-generated text, and sentence-level segmentation tasks to predict the boundaries between human and machine-generated text. The dataset includes texts from various domains such as Reddit, news, Wikipedia, arXiv, Q&A, and texts generated by different models like Deepseek, OpenAI, Anthropic, and Llama. It also contains adversarially attacked data to enhance the robustness of the models.
提供机构:
saiteja33



