Replication Data for: Predicting Stress in Russian using Modern Machine-Learning Tools
收藏DataONE2025-03-04 更新2025-09-20 收录
下载链接:
https://search.dataone.org/view/sha256:c70371bb49e814ba869cea05c089917e19859e02527d2efff7837ca2262a12d5
下载链接
链接失效反馈官方服务:
资源简介:
This dataset consists of a TSV file with five columns of data originating in Zaliznyak's Grammar and Dictionary (1977). The data was programmatically scraped from Giella project data (Moshagen et al., 2013) by Spektor (2021). From Spektor (2021), the data was one of four sources in their RusLex application. Once scraped from there, only symbols were removed. The Russian word data is preserved from the original in Cyrillic. The last column contains abbreviated morphological features in English (e.g. \"V\" for verb, \"N\" for noun, \"Fem\" for feminine, \"Cmpr\" for comparative, \"Impf\" for imperfect). The often many features are separated by semicolons. Stress codes were derived for each word that represented stress placement: If the stressed vowel was at the end of the word a stress code of 0 signifying oxytone stress was assigned. Next, counting from the end of the word, the penultimate stress was given a 1, meaning a stress on the paroxytone. Next, if the antepenultimate syllable contained the stress, the word was assigned a 2, meaning a stress on the proparoxytone. The script continued until a stress code was assigned with the following exceptions: a -1 is assigned for those words without explicit stress markers. The columns in the resultant TSV are: the word without stress markers, the word with stress markers, the derived stress code, the lemma, and all morphological features. The dataset contains over 300,000 words from Zaliznyak (1977) with many repeated words that have unique morphological features. Please see the README or the paper for a full description of the dataset: https://academicworks.cuny.edu/gc_etds/4974. References: Moshagen, Sjur N., Tommi Pirinen, and Trond Trosterud. (2013). Building an open-source development infrastructure for language technology projects. In Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), (pp. 343–352). Spektor, Y. (2021). Detection and morphological analysis of novel Russian loanwords (Master’s thesis, CUNY Graduate Center, New York, NY). Retrieved from https://academicworks.cuny.edu/gc_etds/4572/ Zaliznyak, A.A. (1977). Grammatičeskij slovar’ russkogo jazyka. Slovoizmenenie [A grammatical dictionary of Russian: Inflection]. Moscow: Russkij jazyk
创建时间:
2025-09-16



