How Local Online Media Frame the Community Literacy Development Index in Cilegon, Indonesia: A Computational Text and Framing Analysis

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/dfd8ybfprf

下载链接

链接失效反馈

官方服务：

资源简介：

This workflow began with the manual collection of 114 news articles (2022-2025), which were saved in PDF format. These articles were then converted into raw text (.txt) files using the ocr_pdftotxt.r script with RStudio, which leverages the tesseract package for Optical Character Recognition (OCR). Subsequently, the buat_corpus_baru_txt.r script was used to create a quanteda corpus object from the .txt files with RStudio, while also performing initial pre-processing (tokenisation, stopword removal, etc.). A word frequency table was then generated using buat_tabel_rds.r and further refined by normalising synonyms (ganti_sinonim_dgn_list.r) and removing irrelevant words (hapus_kata_dgn_list.r). From this clean table, 98 keywords were selected and manually classified into Entman's four framing dimensions (problem definition, causal interpretation, moral evaluation, and treatment recommendation), with the results saved in separate .csv files for each dimension. To reproduce the final network visualisations using RStudio, run the corresponding vis_fcm_*.r script (e.g., vis_fcm_problem_definition.r), which will filter the original corpus based on the keyword list for that dimension and generate a Feature Co-occurrence Matrix (FCM) using the quanteda package.

创建时间：

2025-10-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集