five

GPT-3 Curie generated synthetic datasets based on the datasets: Founta, Stormfront, HatEval 2019, Davidson, GermEval 2021, SemEval 2022 Task 4

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10022787
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is a composition of six toxic or hateful synthetic datasets based on the datasets published by:   "Large scale crowdsourcing and characterization of twitter abusive behavior" "Hate Speech Dataset from a White Supremacy Forum" "Automated hate speech detection and the problem of offensive language" "Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter" "Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments" "Don't patronize me! An annotated dataset with patronizing and condescending language towards vulnerable communities"   All data is generated by a separate GPT-3 Curie model fine-tuned on one label of the dataset. The data is not filtered and likely needs to be processed before being useful.
创建时间:
2023-10-19
二维码
社区交流群
二维码
科研交流群
商业服务