five

How Local Online Media Frame the Community Literacy Development Index in Cilegon, Indonesia: A Computational Text and Framing Analysis

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/dfd8ybfprf
下载链接
链接失效反馈
官方服务:
资源简介:
This workflow began with the manual collection of 114 news articles (2022-2025), which were saved in PDF format. These articles were then converted into raw text (.txt) files using the ocr_pdftotxt.r script with RStudio, which leverages the tesseract package for Optical Character Recognition (OCR). Subsequently, the buat_corpus_baru_txt.r script was used to create a quanteda corpus object from the .txt files with RStudio, while also performing initial pre-processing (tokenisation, stopword removal, etc.). A word frequency table was then generated using buat_tabel_rds.r and further refined by normalising synonyms (ganti_sinonim_dgn_list.r) and removing irrelevant words (hapus_kata_dgn_list.r). From this clean table, 98 keywords were selected and manually classified into Entman's four framing dimensions (problem definition, causal interpretation, moral evaluation, and treatment recommendation), with the results saved in separate .csv files for each dimension. To reproduce the final network visualisations using RStudio, run the corresponding vis_fcm_*.r script (e.g., vis_fcm_problem_definition.r), which will filter the original corpus based on the keyword list for that dimension and generate a Feature Co-occurrence Matrix (FCM) using the quanteda package.
创建时间:
2025-10-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作