Background data for: Some obstacles to replication in corpus linguistics
收藏doi.org2024-11-25 更新2025-03-23 收录
下载链接:
https://doi.org/10.18710/7LNWJX
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains tabular files recording occurrences and frequencies of modal verbs in the Brown family corpora; nine modal verbs (can, could, may, might, must, shall, should, will, would) and six corpora are considered (Brown, LOB, Frown, FLOB, BE06, AmE06). Tokens were retrieved using the CQPweb interface provided by the University of Lancaster, and the tables include information on several text-level variables (text length, broad genre, text category, corpus, time period, variety). The data are provided in two formats: (i) in case form, where each token (77,872 in total) is listed separately, including information on the context of occurrence (10 words to the left and 10 to the right); and (ii) in frequency form, which aggregates occurrences by providing information on how often each modal verb appears in every text, thus including one row per text-modal combination (27,000 in total: 6 corpora x 500 texts x 9 modals).
本数据集收录了布朗语料库家族中情态动词出现频率的表格文件;涵盖了九种情态动词(can、could、may、might、must、shall、should、will、would)以及六种语料库(Brown、LOB、Frown、FLOB、BE06、AmE06)。通过兰卡斯特大学提供的CQPweb界面检索了标记(Token),表格中包含了多个文本层面的变量信息(文本长度、广泛体裁、文本类别、语料库、时期、变体)。数据以两种格式提供:(一)以情态动词的形式,其中每个标记(总计77,872个)均单独列出,并包括出现语境信息(左侧和右侧各10个单词);(二)以频率形式,通过提供每个情态动词在每个文本中出现的频率,从而汇总出现情况,每篇文本与情态动词的组合对应一行(总计27,000行:6个语料库 x 500篇文本 x 9种情态动词)。
提供机构:
doi.org



