WikiConv
收藏arXiv2018-10-31 更新2024-06-21 收录
下载链接:
https://github.com/conversationai/wikidetox/tree/master/wikiconv
下载链接
链接失效反馈官方服务:
资源简介:
WikiConv是由康奈尔大学和维基媒体基金会等机构合作创建的大型数据集,包含了维基百科贡献者之间的完整对话历史,总计约9100万次对话,涉及21200万次对话动作,分布在2400万讨论页上。该数据集不仅记录了评论和回复,还包括修改、删除和恢复等中间状态,为研究大规模在线协作过程提供了前所未有的细节。创建过程中,研究者设计了一套方法来识别和结构化这些动作,克服了记录格式不一致和数据量巨大的挑战。WikiConv的应用领域广泛,旨在深入理解在线对话的动态,如个人对话行为如何依赖于讨论场所,以及社区如何高效地管理不良行为。
WikiConv is a large-scale dataset co-created by institutions including Cornell University and the Wikimedia Foundation. It contains complete conversation histories among Wikipedia contributors, with approximately 91 million total conversations, involving 212 million conversational actions, and spanning 24 million discussion pages. This dataset not only records comments and replies, but also captures intermediate states such as edits, deletions, and restorations, providing unprecedented granular details for research into large-scale online collaborative processes. During its development, researchers designed a set of methodologies to identify and structure these actions, overcoming challenges posed by inconsistent record formats and the dataset's massive scale. WikiConv has a wide range of applications, aiming to deepen understanding of online conversation dynamics, such as how individual conversational behaviors depend on discussion venues, and how communities efficiently manage harmful conduct.
提供机构:
康奈尔大学
创建时间:
2018-10-31



