PMC Discussion Phrase-Frame Corpus, v1.0
收藏DataCite Commons2026-05-02 更新2026-05-04 收录
下载链接:
https://osf.io/8rvhk/
下载链接
链接失效反馈官方服务:
资源简介:
What is deposited
Move annotations for 5,445 sentences across 165 Discussion sections (85 biomedical, 80 behavioural psychology) from PubMed Central's Open Access subset
73-rule lexicogrammatical pre-annotation pattern set (Supplementary Material S1)
Phrase-frame extraction outputs: 312 distinct 4-gram p-frame types with 414 classified tokens
Move-by-frame contingency tables and statistical outputs (chi-square, correspondence analysis, log-likelihood keyness)
PMC ID lists and the Python extraction script for deterministic corpus reconstruction
All analytical scripts (frame extraction, keyness analysis, correspondence analysis, figure generation)
Key methodological details
Move annotation: adapted Yang & Allison (2003) 8-move taxonomy, semi-automated workflow, Cohen's κ = .92
P-frame extraction: 4-grams with exactly one variable slot (span 1–4 words), frequency >= 5 per 100k words, range >= 10 texts
Statistical framework: chi-square test of independence, correspondence analysis, log-likelihood keyness, Mann-Whitney U tests
Licences
Derived data (annotations, tables, statistics): CC-BY 4.0
Code and scripts: MIT
Raw PMC text is not redistributed due to variable per-publisher licensing; the corpus can be fully reconstructed by running the deposited extraction script against the provided PMC ID lists.
提供机构:
OSF
创建时间:
2026-05-02



