five

Dataset of discussion threads from Meneame

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2536217
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset from our ICWSM 2017 paper. When using this resource, please use the following citation: Aragón P., Gómez V., Kaltenbrunner A. (2017) To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion, ICWSM-17- 11th International AAAI Conference on Web and Social Media, Montreal, Canada. @inproceedings {aragon2017ICWSM, author = {Arag\'on, Pablo and G\'omez, Vicen\c{c} and Kaltenbrunner, Andreas}, title = {To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion}, booktitle = {ICWSM-17 - 11th International AAAI Conference on Web and Social Media}, publisher = {The AAAI Press}, location = {Montreal, Canada}, year = 2017 } More info about this dataset can also be found at: Aragón P., Gómez V., Kaltenbrunner A., (2017) Detecting Platform Effects in Online Discussions, Policy & Internet, 9, 2017. @article{aragon2017PI, author = {Arag\'on, Pablo and G\'omez, Vicen\c{c} and Kaltenbrunner, Andreas}, title = {Detecting Platform Effects in Online Discussions}, journal = {Policy \& Internet}, volume = {9}, number = {4}, pages = {420-443}, doi = {10.1002/poi3.158}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/poi3.158}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/poi3.158}, year = {2017} }   Crawling process We built a crawling process that collects all the stories in the front page of Meneame from 2011 to 2015 (both years included). We then performed a second crawling process to collect every comment from the discussion thread of each story. From both crawling processes, we obtained 72,005 stories and 5,385,324 comments. It is important to highlight two issues taken into account when the crawler was designed. First, the machine-readable robots.txt file on Meneame does not disallow this process. Second, the footnote of Meneame indicates the licenses of the code, graphics and content of the website. The license for content is Attribution 3.0 Spain (CC BY 3.0 ES) which allows us to release this dataset. Fields Every discussion thread is stored in a JSON file named with the URL slug of the corresponding story in Meneame, located in a yyyy-mm-dd folder. The JSON file is an array of elements with the following fields: id (string): ID of the story/comment sent (timestamp): Date of the story/comment as yyyy-MM-ddThh:mm:ssZ. message (string): Text of the story/comment user (string): Username of the authoring story/comment karma (number): Karma score of the comment when the crawling was performed comments_count (number): Number of comments in reply to the story/post votes (number): Number of votes to the story/comment thread (string): URL of the thread thread_id (string): Sequential arriving order to the thread (0 if story, >=1 if comment) depth (string): Depth within the thread (0 if story, >=1 if comment) url (string): URL of the specific story/comment   title (string): Title, only available for stories. published (string): Date when published on the front page, only available for stories. tags (string): Tags, only available for stories. clics (string): Number of clicks, only available for stories. users (string): Number of user votes, only available for stories. anonymous (string): Number of anonymous votes, only available for stories. negatives (string): Number of negative votes, only available for stories.   in_reply_to_id (string): ID of the parent story/comment, only available for comments. in_reply_to_user (string): Authoring user of the parent story/comment, only available for comments. in_reply_to_thread_id (string): Sequential arriving order to the thread of of the parent story/comment, only available for comments. Acknowledgment This work is supported by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Programme (MDM-2015-0502).
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作