five

Replication Data for: Syntax from and for discourse II: More on complex sentences as meso-constructions

收藏
DataverseNO2025-01-01 更新2026-04-13 收录
下载链接:
https://dataverse.no/citation?persistentId=doi:10.18710/SIPOUV
下载链接
链接失效反馈
官方服务:
资源简介:
<p><b>Dataset abstract:</b> The corpus files employed are a subset of 812 files containing spoken language from the British National Corpus (World edition, Oct. 2000) capturing British English in the late 20th century. For a description of the corpus, see <a href="http://www.natcorp.ox.ac.uk/archive/worldURG/index.xml">http://www.natcorp.ox.ac.uk/archive/worldURG/index.xml</a>. A total of 740 files were chosen because their meta data marked them as belonging to one of the following genres: broadcast discussion; classroom; consultation; conversation; demonstration; interview; interview or oral history; meeting; parliament; public debate; tutorial; spoken unclassified. To these, we added 72 files with the genre descriptions: courtroom; speech unscripted; sports live. </p><p>From these files with spoken British English, all occurrences of adverbial clauses exhibiting one of the four subordinating conjunctions ‘before’, ‘after’, ‘once’, and ‘until’ were extracted. For the final analysis, 8 samples of equal size (together comprising 560 tokens) were created from this output by narrowing down the corpus output to sentence configurations with adverbial clauses with these conjunctions in either initial or final position, by retaining only complex sentence configurations showing both the adverbial clause and a matrix, and by finally selecting only 1 token per file following a randomizer. The size of each of the subsets (70 tokens) was dictated by the frequency of the most infrequent configuration (initial until-clauses). </p>
提供机构:
University of Erfurt; University of California at St. Barbara
创建时间:
2025-01-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作