Wikipedia Talk Corpus
收藏DataCite Commons2025-06-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Wikipedia_Talk_Corpus/4264973/3
下载链接
链接失效反馈官方服务:
资源简介:
We provide a corpus of discussion comments from English Wikipedia talk pages. Comments are grouped into different files by year. Comments are generated by computing diffs over the full revision history and extracting the content added for each revision. See our wiki for documentation of the schema and our research paper for documentation on the data collection and processing methodology.
提供机构:
figshare
创建时间:
2017-01-17
搜集汇总
数据集介绍

背景与挑战
背景概述
Wikipedia Talk Corpus是一个从英文维基百科讨论页提取的大规模评论语料库,数据按年份(2001-2010)分组,总大小约16.7 GB,适用于自然语言处理研究。该数据集通过计算修订历史差异生成,专注于在线讨论内容,具有时间序列结构,许可证为CC0,便于学术使用。
以上内容由遇见数据集搜集并总结生成



