HayatoHongo/soda_jp
收藏Hugging Face2025-12-17 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/HayatoHongo/soda_jp
下载链接
链接失效反馈官方服务:
资源简介:
SODA是一个公开发布的百万规模高质量对话数据集,覆盖了广泛的社会互动。该数据集通过将社交常识知识从知识图谱(Atomic10x)情境化,从PLM(InstructGPT)中蒸馏出对话。人类评估显示,SODA中的对话比之前由人类编写的数据集(如DailyDialog、BlendedSkillTalk)更加一致、具体,并且出人意料地自然。此外,由于社交常识知识包含情感反应(即xReact relation),SODA还包括了385K个标注了1700种独特情感的对话,以及关于体验者和原因(即PersonX和常识知识三元组的head)的信息。
SODA is a publicly available million-scale high-quality dialogue dataset covering a wide range of social interactions. Dialogues are distilled from a PLM (InstructGPT; Ouyang et al., 2022) by contextualizing social commonsense knowledge from a knowledge graph (Atomic10x; West et al., 2022). Human evaluation shows that dialogues in SODA are more consistent, specific, and (surprisingly) natural than prior human-authored datasets – e.g., DailyDialog (Li et al., 2017), BlendedSkillTalk (Smith et al., 2020). Also, since social commonsense knowledge encompasses emotional reactions (i.e., the xReact relation), SODA includes 385K conversations labeled with 1.7K unique emotions along with information about the experiencer and the cause – i.e., PersonX and the head event in the symbolic commonsense knowledge triple.
提供机构:
HayatoHongo



