five

leitro/Copiale_Lines

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/leitro/Copiale_Lines
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Copiale Lines task_categories: - image-to-text language: - de size_categories: - 1K<n<10K viewer: true --- # Copiale Lines Copiale Lines is a line-level image-to-text dataset for historical cipher decipherment. It contains cropped line images from the Copiale manuscript paired with plaintext ground truth. This dataset is used in the paper *Learning to Decipher from Pixels: A Case Study of Copiale* (HistoCrypt 2026). ## Dataset Structure The dataset is split into: - train: 1,269 samples - valid: 175 samples - test: 370 samples Each split contains: - `images/*.png`: cropped line images - `metadata.csv`: filename and plaintext transcription The corresponding source split files are `train.gt`, `valid.gt`, and `test.gt`, where each line is: ```text image_id<TAB>groundtruth ``` ## Example ```text 1-2.png,gesetz buchs ``` corresponds to the image: ```text train/images/1-2.png ``` ## Intended Use This dataset is intended for research on handwritten cipher recognition, image-to-text modeling, and transcription-free decipherment. ## Citation ```bibtex @inproceedings{kang2026learning, title = {Learning to Decipher from Pixels: A Case Study of Copiale}, author = {Kang, Lei and De Gregorio, Giuseppe and Heil, Raphaela and Fornés, Alicia and Megyesi, Beáta}, booktitle = {International Conference on Historical Cryptology (HistoCrypt)}, year = {2026} } ``` ## Acknowledgements This dataset is derived in part from materials related to *Decipherment of Historical Manuscripts*, a historical manuscript studied within the project ["The Copiale Cipher"](https://www.su.se/english/research/research-catalogue/research-projects/d/decipherment-of-historical-manuscripts/the-copiale-cipher) at Stockholm University. We acknowledge and thank the original project for making these resources available. We also gratefully acknowledge financial support from **Riksbankens Jubileumsfond** under grant **M24-0028**, *"Echoes of History: Analysis and Decipherment of Historical Writings (DESCRYPT)"*, which supported the development of this dataset.
提供机构:
leitro
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作