RefutES
收藏SSH Open MarketPlace2026-03-20 更新2026-03-21 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/cgZDdl
下载链接
链接失效反馈官方服务:
资源简介:
The RefutES dataset was created for the research and evaluation of systems capable of generating counter-narratives to hate speech. The dataset is based on the CONAN-MT-SP corpus, which contains pairs of hate speech (HS) and counter-narrative (CN) messages targeting eight specific groups: people with disabilities, Jews, the LGBT+ community, migrants, Muslims, people of color, women, and other groups.
To construct it, the original English CONAN-MT corpus was used as a starting point; its hate speech messages were translated into Spanish using the DeepL API and subsequently reviewed and corrected by human annotators. The associated counter-narratives were generated using language models (GPT-4) via a few-shot learning strategy and evaluated by human experts across multiple dimensions: level of offensiveness, stance toward the message, degree of informativeness, veracity, need for editing, and comparison between human and automated responses.
Based on this process, RefutES selects only those counter-narratives considered “perfect”—that is, non-offensive, clearly at odds with the hate speech, informative, truthful, and requiring no editing. The corpus is divided into three subsets, each related to a different part of the competition:
- Train split: contains 2496 HS-CN pairs.
- Dev split: contains 279 HS-CN pairs.
-Test split: contains 156 pairs HS-CN. 78 HS-CN pairs are generated by GPT-4 and manually annotated by humans and the others 78 HS-CN pairs generated by humans.
创建时间:
2026-03-20



