five

Multi-Dimensional Analysis of Czech

收藏
DataONE2018-10-30 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:10319b574871085b5594f87078cb4165a0db70b4c79683c333b7b4bd601b9dfd
下载链接
链接失效反馈
官方服务:
资源简介:
Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on 3428 Czech text chunks, and an R script which performs a factor analysis on this data set. The results of this factor analysis were used as a basis for an 8-dimensional model of register variation in Czech (see Related Publications), following the methodology introduced by Douglas Biber (see e.g. his 1988 seminal work Variation Across Speech and Writing for details on the methodology, or his 2014 article “Using multi-dimensional analysis to explore cross-linguistic universals of register variation” for a review of MDA results across a variety of languages). The data is derived from the Koditex corpus , which aims to be as diversified as possible, covering various forms of spoken and written (both print and on-line) Czech. In compiling this corpus, the purpose was to provide a solid empirical basis for a comprehensive general-purpose model of register variation in Czech. Apart from this data set and related publications, additional resources pertaining to the project are available via the czcorpus/mda GitHub repository.
创建时间:
2024-01-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作