GPT-3 Curie generated synthetic datasets based on the datasets: Founta, Stormfront, HatEval 2019, Davidson, GermEval 2021, SemEval 2022 Task 4

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/10022787

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset is a composition of six toxic or hateful synthetic datasets based on the datasets published by: "Large scale crowdsourcing and characterization of twitter abusive behavior" "Hate Speech Dataset from a White Supremacy Forum" "Automated hate speech detection and the problem of offensive language" "Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter" "Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments" "Don't patronize me! An annotated dataset with patronizing and condescending language towards vulnerable communities" All data is generated by a separate GPT-3 Curie model fine-tuned on one label of the dataset. The data is not filtered and likely needs to be processed before being useful.

创建时间：

2023-10-19