ifmain/multilingual-moderation-90K
收藏Hugging Face2024-10-13 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/ifmain/multilingual-moderation-90K
下载链接
链接失效反馈官方服务:
资源简介:
该数据集基于Kaggle上的一个项目,是[@ifmain/text-moderation-410K](https://huggingface.co/datasets/ifmain/text-moderation-410K)的一个版本,已经清除了语义相似的值,并将负面和中性的条目比例标准化为50/50。数据集包含150万条条目(91K * 17种语言),支持的语言包括英语、德语、法语、西班牙语、意大利语、瑞典语、芬兰语、波兰语、捷克语、拉脱维亚语、中文、日语、韩语、俄语、乌克兰语、白俄罗斯语和哈萨克语。在使用前,建议进行数据增强,推荐使用[@ifmain/StringAugmentor](https://github.com/ifmain/StringAugmentor)工具进行增强。
This dataset is based on a project from Kaggle and represents a version of [@ifmain/text-moderation-410K](https://huggingface.co/datasets/ifmain/text-moderation-410K) that has been cleansed of semantically similar values and normalized to a 50/50 ratio of negative and neutral entries. The dataset contains 1.5M entries (91K * 17 languages) and supports languages including English, German, French, Spanish, Italian, Swedish, Finnish, Polish, Czech, Latvian, Chinese, Japanese, Korean, Russian, Ukrainian, Belarusian, and Kazakh. Before use, augmentation is recommended, and the [@ifmain/StringAugmentor](https://github.com/ifmain/StringAugmentor) tool is suggested for augmentation.
提供机构:
ifmain



