aisingapore/nlu-toxicity_detection
收藏Hugging Face2024-12-20 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/aisingapore/nlu-toxicity_detection
下载链接
链接失效反馈官方服务:
资源简介:
SEA Toxicity Detection数据集用于评估模型在识别文本中的有毒内容(如仇恨言论和侮辱性语言)方面的能力。数据集涵盖了印尼语、泰语和越南语,并按语言划分,包含少量示例的分割。数据来源包括MLHSD(印尼语)、TTD(泰语)和ViHSD(越南语)。数据集的设计旨在评估聊天或指令调优的大型语言模型(LLMs),并作为SEA-HELM排行榜的一部分。
The SEA Toxicity Detection dataset is designed to evaluate a models ability to identify toxic content such as hate speech and abusive language in text. It is sampled from MLHSD for Indonesian, TTD for Thai, and ViHSD for Vietnamese. The dataset supports tasks including text generation and text classification, primarily for evaluating chat or instruction-tuned large language models (LLMs). The dataset is split by language and includes additional fewshot example splits. Statistics for the dataset include the number of examples per language and token counts for different models. The sources and license information for the dataset are also detailed in the document.
提供机构:
aisingapore



