five

PMC Discussion Phrase-Frame Corpus, v1.0

收藏
DataCite Commons2026-05-02 更新2026-05-04 收录
下载链接:
https://osf.io/8rvhk/
下载链接
链接失效反馈
官方服务:
资源简介:
What is deposited Move annotations for 5,445 sentences across 165 Discussion sections (85 biomedical, 80 behavioural psychology) from PubMed Central's Open Access subset 73-rule lexicogrammatical pre-annotation pattern set (Supplementary Material S1) Phrase-frame extraction outputs: 312 distinct 4-gram p-frame types with 414 classified tokens Move-by-frame contingency tables and statistical outputs (chi-square, correspondence analysis, log-likelihood keyness) PMC ID lists and the Python extraction script for deterministic corpus reconstruction All analytical scripts (frame extraction, keyness analysis, correspondence analysis, figure generation) Key methodological details Move annotation: adapted Yang & Allison (2003) 8-move taxonomy, semi-automated workflow, Cohen's κ = .92 P-frame extraction: 4-grams with exactly one variable slot (span 1–4 words), frequency >= 5 per 100k words, range >= 10 texts Statistical framework: chi-square test of independence, correspondence analysis, log-likelihood keyness, Mann-Whitney U tests Licences Derived data (annotations, tables, statistics): CC-BY 4.0 Code and scripts: MIT Raw PMC text is not redistributed due to variable per-publisher licensing; the corpus can be fully reconstructed by running the deposited extraction script against the provided PMC ID lists.
提供机构:
OSF
创建时间:
2026-05-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作