Finnish-NLP/Reddit_Finnish_fineweb_edu_predicted
收藏Hugging Face2025-01-09 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/Finnish-NLP/Reddit_Finnish_fineweb_edu_predicted
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个文本分类数据集,包含了文本内容以及对应的多个标签,这些标签用于识别文本是否具有身份攻击、侮辱、涉黄、严重毒性、威胁或毒性等特性。数据集还包含了文本的困惑度、嵌入表示和预测概率。数据集分为训练集,并提供了详细的字节大小和示例数量信息。
This dataset is a text classification dataset that includes text content and corresponding multiple labels, which are used to identify whether the text has characteristics of identity attack, insult, obscenity, severe toxicity, threat, or toxicity. The dataset also contains text perplexity, embedding representations, and prediction probabilities. The dataset is split into a training set and provides detailed information on byte size and the number of examples.
提供机构:
Finnish-NLP



