Ukrainian 14-syllable verse in Belarusian poetry: the rhythm of translations and imitations (dataset)
收藏Zenodo2025-08-11 更新2026-05-25 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.7119988
下载链接
链接失效反馈官方服务:
资源简介:
Data and source code accompanying the talk:
У. В. Парыцкі. Украінскі 14-складовы верш у беларускай паэзіі: рытміка перакладаў і імітацый // X Міжнародны Кангрэс даследчыкаў Беларусі, Коўна, 01.10.2022 [Vladislav Poritski. Ukrainian 14-syllable verse in Belarusian poetry: the rhythm of translations and imitations // Presented at 10th International Congress of Belarusian Studies, Kaunas, 01.10.2022]
The empirical investigation of 14-syllable verse, presented in the talk, is based upon a sample of Ukrainian texts by Taras Shevchenko, their Belarusian translations, and original Belarusian poetry by Yanka Kupala, Yakub Kolas, Piatruś Brouka. The dataset structure is as follows:
./0_plain – plain texts;
./1_accentuated – accentuated texts;
metadata_shevchenko.tsv, metadata_be_authors.tsv – metadata files describing the texts;
make_reports.py – a Python script to generate statistic reports from the accentuated texts;
./2_reports – programmatically generated reports;
slides.tex – LaTeX source code of the talk's slides, where the reports are embedded as diagrams and tables;
slides.pdf – PDF version of the slides.
The directories ./0_plain, ./1_accentuated, ./2_reports are provided in .zip archives.
Belarusian translations of Taras Shevchenko's poetry have been taken from the book:
Т. Р. Шаўчэнка. Вершы. Паэмы. Мінск: Мастацкая літаратура, 1989.
(scan copy available at https://files.knihi.com/Knihi/scanned/Saucenka.Viersy_paemy.djvu)
Each poem is stored in a separate .txt file. The file name indicates the number of the poem's first page in the scanned book, e.g.: 021.txt. Same names are used for the respective Ukrainian texts. In each pair of files, such as e.g. ./0_plain/uk/021.txt and ./0_plain/be/021.txt, the texts are aligned line by line. Poem titles in both languages, translator names, and the URLs of Ukrainian source texts are provided in metadata_shevchenko.tsv.
Original Belarusian poetry, kept in ./0_plain/be, doesn't require any alignment, and the naming scheme is different. Poem titles, author names, and the URLs of Belarusian source texts are provided in metadata_be_authors.tsv.
In all Ukrainian and Belarusian texts, metrically irrelevant lines are discarded, only 14-syllable verse lines are stored, each of them split graphically into 8+6 syllables. Occasional minor violations, i.e. ± one or two syllables, are allowed in the texts but ignored in the statistic reports. No spans shorter than a pair of rhyming 14-syllable lines (or, graphically, a quatraine of 8+6+8+6 syllables) were sampled from polymetric poems.
These special characters are used:
"/" to represent line break in the source edition;
"//" for section break (next stanza, another character's words);
trailing "#" for the inverse of line break: to recover the original 14-syllable line as printed in the source edition, one should remove the newline;
leading "#" for mis-aligned lines, e.g. those missing in the Belarusian translation and added hypothetically, in order to restore the alignment.
The procedure of accentuating Ukrainian and Belarusian texts was semi-automatic, using an opportunistic database of word accents crawled from online lexicographic resources: https://slounik.org for Belarusian, https://uk.wiktionary.org and https://slovnyk.ua/nagolos.php for Ukrainian. The database and the accentuator script are not part of this dataset. Although a fair bit of manual supervision was put into ensuring that most accents are accurate, it's likely that some errors still remain, especially in the Ukrainian data, so please be cautious.
Accentuated texts in ./1_accentuated/uk and ./1_accentuated/be are lowercased, with all punctuation stripped off. As usual in quantitative study of East Slavic verse (see e.g. https://doi.org/10.12697/smp.2019.6.2.02 for a recent overview), we distinguish between two kinds of stresses: pronouns and certain other function words bear "light" stress, while content words bear "heavy" stress. These are the designations:
"`" for light stress, to the left of the stressed vowel;
"'" for heavy stress, to the right of the stressed vowel (note that after consonants, "'" is an apostrophe);
"*" for variant heavy stress, as in Ukrainian ба*йду*же;
"_" to group clitics together with stressed words, as in Ukrainian і_не_привіта'ла.
In rare exceptional cases, the meter may require to pronounce syllabic consonants, as in Belarusian рэестр. To match pronunciation, we add a vowel in square brackets: рэест[а]р.
The reports summarize certain statistic properties of the dataset:
translators.csv – a breakdown of Shevchenko's Belarusian translations into the numbers of lines contributed by each translator. 8+6 are counted as separate lines. Syllable count violations are ignored: a pair of aligned Ukrainian / Belarusian lines is not counted towards the translator's total, if the number of syllables is irrelevant (e.g. 9 and 9) or mismatched (e.g. 8 and 6).
be_authors.csv – line counts by author in the original Belarusian poetry. Same counting rules apply, modulo the alignment.
rhythm.csv – percentages of accents on each of the 14 syllables in various samples, grouped by author and / or translator. Rows are syllables, columns are samples. Accents in each sample are counted two ways: "min" – only heavy stresses, "max" – all stresses.
total_accentuation.csv – average accent counts per line in Shevchenko's Ukrainian texts and Belarusian translations, separately for 8+6, separately for heavy and all stresses.
word_boundary.csv – statistics of word boundary positions in 8-syllable 2-word heavy-stressed lines in Shevchenko's Ukrainian texts and Belarusian translations.
trochaicity.csv – ratio of stresses that match trochaic metrical template, separately for 8+6, heavy stresses only. Rows are samples: Shevchenko's Ukrainian texts and Belarusian translations, original poetry by three Belarusian authors.
For implementation details, see the source code of make_reports.py.
To reproduce report generation, you will need Python. Unzip the archive 1_accentuated.zip and run:
python3 make_reports.py
To rebuild the slides, you will need LaTeX:
xelatex -synctex=1 -interaction=nonstopmode -shell-escape slides.tex
If the bibliographic references are not rendered properly, rerun once again.
提供机构:
Zenodo
创建时间:
2022-09-29



