five

Latin-transliterated Ottoman Turkish Corpus (LOTUC)

收藏
Zenodo2025-08-22 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15307154
下载链接
链接失效反馈
官方服务:
资源简介:
This corpus includes 36 Ottoman Turkish poem book, Dîvân, written between 15th and 19th century. The books were transliterated by domain experts and publicly shared on the Internet. The books in the corpus were automatically structured via a rule-based approach and manually checked.  Century Author Count Poem Count Line Count Token Count Type Count 15 3 659 9,265 59,713 19,675 16 15 11,276 139,086 886,657 122,672 17 7 2,191 29,188 179,777 46,432 18 4 847 28,925 172,728 52,759 19 7 2,867 44,605 271,587 55,334 Total 36 17,840 251,069 1,570,462 214,853   Each Divân work and poem is accompanied by the following fields: Work-level (LOTUC_metadata.csv) file_name work_name (title of the Dîvân) pen_name (author’s mahlas) real_name century gender rank (e.g. “Sultan,” “Judiciary & Religious Office,” "High Bureaucracy/Military," “Scholars & Sufi Orders,” “Civil Bureaucracy,” and “Lay/Non-official”) Poem-level (LOTUC.json) poem_id title (if available) meter (in aruz notation) text (line-by-line Latin transliteration) The corpus can be utilised for diachronic studies as Yılandiloğlu (forthcoming) demonstrated that poets adhered more accurately to the aruz meter over the centuries, reflected in rising conformity rates. Additionally, metadata can be leveraged to focus a specific rank such as sultan or gender. While this version has inconsistencies in terms of transliteration, current work is focused on standardizing the corpus according to IJMES transliteration system and increase the size of the corpus.
提供机构:
Zenodo
创建时间:
2025-04-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作