five

Replication Data for: Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media

收藏
DataCite Commons2026-02-04 更新2026-02-08 收录
下载链接:
https://dataverse.bsc.es/citation?persistentId=doi:10.82201/8UPPY6
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset and replication package accompany the paper: “Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media.” by Alejandro De La Fuente-Cuesta, Alberto Martínez-Serra, R. Nienke Visscher, and Ana S. Cardenal (2025). The materials include the data and code necessary to reproduce all analyses presented in the main text and the Supplementary Information. The study investigates whether large language models (LLMs) can accurately classify political versus non-political news content based solely on URLs, compared to full-text analysis, across five countries (France, Germany, Spain, the UK, and the US). Using web-tracking data and manually coded ground-truth labels, we benchmark multiple state-of-the-art LLMs (Gemma-3-27B, Mistral-3.1-24B, Qwen-32B, Llama-3.1-8B, and DeepSeek-R1-Distill-Qwen-7B) to assess their performance, precision–recall trade-offs, and sources of bias in URL-only classification. The dataset is derived from web-tracking records of news consumption across five democratic countries. A subset of 1,140 URLs was manually coded by human annotators as either Political (POL) or Non-political (NON) content to serve as the gold standard. LLM predictions were then compared against these human labels to compute accuracy, F1, precision, recall, and Cohen’s Kappa metrics. All personal data were anonymized before analysis, and all procedures complied with GDPR and institutional ethical guidelines. Replication Instructions Open the .Rmd files in RStudio (R ≥ 4.2). Install the packages listed in the setup section. Knit the documents to reproduce the corresponding .html outputs. All analyses use open-source R packages and can be fully reproduced on any standard machine. If you use these data or materials, please cite: Martínez-Serra, A., De la Fuente-Cuesta, A., Visscher, R. N., & Cardenal, A. S. (2025). Beyond the Link: Assessing LLMs’ Ability to Classify Political Content Across Global Media.
提供机构:
BSC Dataverse
创建时间:
2025-10-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作