Multi-Dimensional Analysis of Czech

DataONE2018-10-30 更新2024-06-08 收录

下载链接：

https://search.dataone.org/view/sha256:10319b574871085b5594f87078cb4165a0db70b4c79683c333b7b4bd601b9dfd

下载链接

链接失效反馈

官方服务：

资源简介：

Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on 3428 Czech text chunks, and an R script which performs a factor analysis on this data set. The results of this factor analysis were used as a basis for an 8-dimensional model of register variation in Czech (see Related Publications), following the methodology introduced by Douglas Biber (see e.g. his 1988 seminal work Variation Across Speech and Writing for details on the methodology, or his 2014 article “Using multi-dimensional analysis to explore cross-linguistic universals of register variation” for a review of MDA results across a variety of languages). The data is derived from the Koditex corpus , which aims to be as diversified as possible, covering various forms of spoken and written (both print and on-line) Czech. In compiling this corpus, the purpose was to provide a solid empirical basis for a comprehensive general-purpose model of register variation in Czech. Apart from this data set and related publications, additional resources pertaining to the project are available via the czcorpus/mda GitHub repository.

创建时间：

2024-01-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集