five

Media Bias Aware Simulation Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11168980
下载链接
链接失效反馈
官方服务:
资源简介:
We utilized the Hyperpartisan News Detection Dataset, released with the SemEval-2019 Task 4 Hyperpartisan detection task, due to its extensive bias labels. To ensure accurate bias labels, we used the Overlap-checking (1:1) model, retaining only articles where the distant supervision bias labels matched the model's predictions. This validation process resulted in 409,757 articles. These articles span from 1960 to 2018, with a sparse distribution in earlier years. We focused on articles from May 1, 2017, to December 31, 2017, resulting in a subset of 72,940 news articles, ensuring a consistent daily news flow. We processed this subset by removing HTML tags and special characters and generating news summaries using PEGASUS. We used Latent Dirichlet Allocation (LDA) to categorize the articles into 20 news themes, based on perplexity scores. This dataset was then fed into the simulation framework, with a cut-off date of June 24, 2017. News Recommendation Dataset: Includes user-item interaction records from May 1 to June 24, providing users' reading histories and interacted news articles. Training Split: Data from May 1 to June 17, used to train news recommendation algorithms. Evaluation Split: Data from June 17 to June 24, used to evaluate the trained recommendation algorithms. Candidate News Dataset: News articles published from June 25 to December 31, presented to users during simulations. For more information, please visit https://github.com/ruanqin0706/UserRecSimulation.
创建时间:
2024-07-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作