GPT-3 Curie generated synthetic datasets based on the datasets: Founta, Stormfront, HatEval 2019, Davidson, GermEval 2021, SemEval 2022 Task 4
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10022787
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a composition of six toxic or hateful synthetic datasets based on the datasets published by:
"Large scale crowdsourcing and characterization of twitter abusive behavior"
"Hate Speech Dataset from a White Supremacy Forum"
"Automated hate speech detection and the problem of offensive language"
"Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter"
"Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments"
"Don't patronize me! An annotated dataset with patronizing and condescending language towards vulnerable communities"
All data is generated by a separate GPT-3 Curie model fine-tuned on one label of the dataset. The data is not filtered and likely needs to be processed before being useful.
创建时间:
2023-10-19



