Dataset of discussion threads from Meneame
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2536217
下载链接
链接失效反馈官方服务:
资源简介:
Dataset from our ICWSM 2017 paper. When using this resource, please use the following citation:
Aragón P., Gómez V., Kaltenbrunner A. (2017) To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion, ICWSM-17- 11th International AAAI Conference on Web and Social Media, Montreal, Canada.
@inproceedings {aragon2017ICWSM,
author = {Arag\'on, Pablo and G\'omez, Vicen\c{c} and Kaltenbrunner, Andreas},
title = {To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion},
booktitle = {ICWSM-17 - 11th International AAAI Conference on Web and Social Media},
publisher = {The AAAI Press},
location = {Montreal, Canada},
year = 2017
}
More info about this dataset can also be found at:
Aragón P., Gómez V., Kaltenbrunner A., (2017) Detecting Platform Effects in Online Discussions, Policy & Internet, 9, 2017.
@article{aragon2017PI,
author = {Arag\'on, Pablo and G\'omez, Vicen\c{c} and Kaltenbrunner, Andreas},
title = {Detecting Platform Effects in Online Discussions},
journal = {Policy \& Internet},
volume = {9},
number = {4},
pages = {420-443},
doi = {10.1002/poi3.158},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/poi3.158},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/poi3.158},
year = {2017}
}
Crawling process
We built a crawling process that collects all the stories in the front page of Meneame from 2011 to 2015 (both years included). We then performed a second crawling process to collect every comment from the discussion thread of each story. From both crawling processes, we obtained 72,005 stories and 5,385,324 comments.
It is important to highlight two issues taken into account when the crawler was designed. First, the machine-readable robots.txt file on Meneame does not disallow this process. Second, the footnote of Meneame indicates the licenses of the code, graphics and content of the website. The license for content is Attribution 3.0 Spain (CC BY 3.0 ES) which allows us to release this dataset.
Fields
Every discussion thread is stored in a JSON file named with the URL slug of the corresponding story in Meneame, located in a yyyy-mm-dd folder. The JSON file is an array of elements with the following fields:
id (string): ID of the story/comment
sent (timestamp): Date of the story/comment as yyyy-MM-ddThh:mm:ssZ.
message (string): Text of the story/comment
user (string): Username of the authoring story/comment
karma (number): Karma score of the comment when the crawling was performed
comments_count (number): Number of comments in reply to the story/post
votes (number): Number of votes to the story/comment
thread (string): URL of the thread
thread_id (string): Sequential arriving order to the thread (0 if story, >=1 if comment)
depth (string): Depth within the thread (0 if story, >=1 if comment)
url (string): URL of the specific story/comment
title (string): Title, only available for stories.
published (string): Date when published on the front page, only available for stories.
tags (string): Tags, only available for stories.
clics (string): Number of clicks, only available for stories.
users (string): Number of user votes, only available for stories.
anonymous (string): Number of anonymous votes, only available for stories.
negatives (string): Number of negative votes, only available for stories.
in_reply_to_id (string): ID of the parent story/comment, only available for comments.
in_reply_to_user (string): Authoring user of the parent story/comment, only available for comments.
in_reply_to_thread_id (string): Sequential arriving order to the thread of of the parent story/comment, only available for comments.
Acknowledgment
This work is supported by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Programme (MDM-2015-0502).
创建时间:
2020-01-24



