Wikivoyage Review Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://en.wikivoyage.org/w/api.php
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了来自维基导游的评论,涵盖了2004年1月1日至2019年12月31日期间,70,260位编辑关于3,369篇不同文章的编辑信息。值得注意的是,该数据集存在严重的不平衡问题,其中只有8,305条评论被回滚,仅占总样本的0.03%。为了解决这一问题,我们采用了合成数据生成方法来进行类别平衡。该数据集的规模涉及来自70,260位编辑的285,698个样本,任务是对评论进行分类,判断其是否为回滚评论。
This dataset contains comments sourced from Wikivoyage, covering edit records from 70,260 editors on 3,369 distinct articles spanning from January 1, 2004 to December 31, 2019. Notably, this dataset exhibits severe class imbalance, as only 8,305 comments were rolled back, accounting for merely 0.03% of the total samples. To address this imbalance, we adopted synthetic data generation methods to achieve class balancing. The dataset comprises 285,698 samples from 70,260 editors, and its associated classification task is to determine whether a given comment is a rolled-back comment.
提供机构:
Wikivoyage



