Supporting data for: LLM-Assisted Keymorph Analysis of Grammatical Case in RT's Israeli–Palestinian Conflict Coverage
收藏DataCite Commons2026-05-15 更新2026-05-17 收录
下载链接:
https://dataverse.no/citation?persistentId=doi:10.18710/YTSGDM
下载链接
链接失效反馈官方服务:
资源简介:
<b>Dataset description:</b>
<p>The dataset for this study supports a Keymorph Analysis of grammatical cases in Russian-language news headlines concerning the the 2023-2025 Israeli-Palestinian conflict, collected from RT's official news website.</p>
<p>The dataset comprises four main components:</p>
<ol>
<li>Raw Headlines and Filtered Corpus: This component includes the initial collection of Russian-language headlines from RT (2023-10-07 to 2025-01-19) and the subsequently filtered corpus of 8,757 distinct headlines containing specified keywords related to the conflict (e.g., 'Israel', 'Palestine', 'Gaza', 'Hamas').</li>
<li>Reference Corpus: The reference corpus was constructed from the National Media Subcorpus of the Russian National Corpus (RNC).</li>
<li>Annotated Corpus of Grammatical Cases: This core component features the grammatical case annotations for 11 identified target keywords across the corpus. The annotations were generated using an LLM (ChatGPT-5 mini API) with a 20% human-reviewed and corrected sample integrated into the final dataset to ensure high quality and accuracy.</li>
<li>Derived Analytical Data and Visualizations: This includes statistical summaries of keyword frequencies and grammatical case distributions, standardized Pearson residual values and log-likelihood (LL) ratio values crucial for keymorph identification, and various visualizations such as word frequency charts and residual heatmaps, all derived from the annotated corpus to support the keymorph analysis.</li>
</ol>
提供机构:
DataverseNO
创建时间:
2026-04-21



