Media Bias Aware Simulation Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11168980
下载链接
链接失效反馈官方服务:
资源简介:
We utilized the Hyperpartisan News Detection Dataset, released with the SemEval-2019 Task 4 Hyperpartisan detection task, due to its extensive bias labels. To ensure accurate bias labels, we used the Overlap-checking (1:1) model, retaining only articles where the distant supervision bias labels matched the model's predictions. This validation process resulted in 409,757 articles.
These articles span from 1960 to 2018, with a sparse distribution in earlier years. We focused on articles from May 1, 2017, to December 31, 2017, resulting in a subset of 72,940 news articles, ensuring a consistent daily news flow. We processed this subset by removing HTML tags and special characters and generating news summaries using PEGASUS. We used Latent Dirichlet Allocation (LDA) to categorize the articles into 20 news themes, based on perplexity scores.
This dataset was then fed into the simulation framework, with a cut-off date of June 24, 2017.
News Recommendation Dataset: Includes user-item interaction records from May 1 to June 24, providing users' reading histories and interacted news articles.
Training Split: Data from May 1 to June 17, used to train news recommendation algorithms.
Evaluation Split: Data from June 17 to June 24, used to evaluate the trained recommendation algorithms.
Candidate News Dataset: News articles published from June 25 to December 31, presented to users during simulations.
For more information, please visit https://github.com/ruanqin0706/UserRecSimulation.
创建时间:
2024-07-02



