Latin-transliterated Ottoman Turkish Corpus (LOTUC)
收藏Zenodo2025-08-22 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15307154
下载链接
链接失效反馈官方服务:
资源简介:
This corpus includes 36 Ottoman Turkish poem book, Dîvân, written between 15th and 19th century. The books were transliterated by domain experts and publicly shared on the Internet. The books in the corpus were automatically structured via a rule-based approach and manually checked.
Century
Author Count
Poem Count
Line Count
Token Count
Type Count
15
3
659
9,265
59,713
19,675
16
15
11,276
139,086
886,657
122,672
17
7
2,191
29,188
179,777
46,432
18
4
847
28,925
172,728
52,759
19
7
2,867
44,605
271,587
55,334
Total
36
17,840
251,069
1,570,462
214,853
Each Divân work and poem is accompanied by the following fields:
Work-level (LOTUC_metadata.csv)
file_name
work_name (title of the Dîvân)
pen_name (author’s mahlas)
real_name
century
gender
rank (e.g. “Sultan,” “Judiciary & Religious Office,” "High Bureaucracy/Military," “Scholars & Sufi Orders,” “Civil Bureaucracy,” and “Lay/Non-official”)
Poem-level (LOTUC.json)
poem_id
title (if available)
meter (in aruz notation)
text (line-by-line Latin transliteration)
The corpus can be utilised for diachronic studies as Yılandiloğlu (forthcoming) demonstrated that poets adhered more accurately to the aruz meter over the centuries, reflected in rising conformity rates. Additionally, metadata can be leveraged to focus a specific rank such as sultan or gender.
While this version has inconsistencies in terms of transliteration, current work is focused on standardizing the corpus according to IJMES transliteration system and increase the size of the corpus.
提供机构:
Zenodo
创建时间:
2025-04-30



