A Dataset of Journalists' Interactions with Their Readership

Name: A Dataset of Journalists' Interactions with Their Readership
Creator: OpenDataLab
Published: 2026-05-24 07:30:18
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/A_Dataset_of_Journalists_etc

下载链接

链接失效反馈

官方服务：

资源简介：

此存储库包含一个 python 脚本和列出评论 ID 的文件。它是我们在 CIKM 2020 上发表的资源论文的一部分，题为“记者与读者互动的数据集：文章作者何时应回复读者评论？”。_x000D_ _x000D_ 为了支持可重复性并推动该领域的进一步研究，我们提供了一个脚本来下载一组 38,000 条评论。该脚本访问 Guardian 的 Web API 以下载由其 ID 标识的预定义评论列表。这些评论中有一半得到了记者的回复，而另一半则没有。由于提供的类标签和平衡的类分布，评论可以很容易地用于监督机器学习。此外，我们还提供了记者回复的评论 ID。

This repository contains a Python script and a file listing comment IDs. It is part of our resource paper published at CIKM 2020, titled *A Dataset for Journalist-Reader Interaction: When Should Article Authors Reply to Reader Comments?* To support reproducibility and advance further research in this field, we provide a script to download a set of 38,000 comments. This script accesses The Guardian’s Web API to download a predefined list of comments identified by their IDs. Half of these comments received a journalist's reply, while the other half did not. With the provided class labels and balanced class distribution, these comments can be readily used for supervised machine learning tasks. In addition, we also provide the IDs of comments that received a journalist reply.

提供机构：

OpenDataLab

创建时间：

2022-06-28

搜集汇总

数据集介绍