sofia-uni/toxic-data-bg
收藏Hugging Face2025-04-04 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sofia-uni/toxic-data-bg
下载链接
链接失效反馈官方服务:
资源简介:
toxic-data-bg数据集是一个包含4384个保加利亚语人工注释句子的文本分类数据集,分为有毒语言、医学术语、非有毒语言和与少数民族社区相关的术语四个类别。该数据集是Bulgarian Hate speech detection数据集的扩展,来源于多个保加利亚语论坛。
The toxic-data-bg dataset is a text classification dataset containing 4,384 manually annotated Bulgarian sentences across four categories: toxic language, medical terminology, non-toxic language, and terms related to minority communities. This dataset is an extension of the Bulgarian Hate speech detection dataset, sourced from various Bulgarian forums.
提供机构:
sofia-uni



