five

Teklia/DAI-CReTDHI-IndexCards-KIE

收藏
Hugging Face2026-02-12 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/Teklia/DAI-CReTDHI-IndexCards-KIE
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: split dtype: string - name: source dtype: string - name: record_id dtype: string - name: record_url dtype: string - name: text dtype: string splits: - name: train num_bytes: 315073 num_examples: 436 - name: val num_bytes: 39281 num_examples: 59 - name: test num_bytes: 40939 num_examples: 58 download_size: 123738 dataset_size: 395293 configs: - config_name: default data_files: - split: train path: data/train-* - split: val path: data/val-* - split: test path: data/test-* size_categories: n<1K language: - fr tags: - atr - htr - ocr - modern - handwritten - printed annotations_creators: - expert-generated license: mit task_categories: - image-to-text --- # DAI-cards-ATR - Page level ## Dataset Description - **Homepage:** [DAI-CReTDHI](https://dai-cretdhi.univ-lr.fr/) - **Source:** [Archives Municipales de Tours](https://www.tours.fr/page-portail-ma-mairie/services-pratiques/offre-culturelle/patrimoine-histoire-archives/archives-municipales/) - **Point of Contact:** [TEKLIA](https://teklia.com) This dataset comprises structured index cards created by archivists at the Municipal Archives of Tours through systematic examination and transcription of parish and civil registers. These cards serve as archival research tools, documenting baptisms, marriages, and burials with detailed personal, familial, and socio-professional information. The cards follow a color-based classification system: - Marriages: pink, mauve, purple - Baptisms/Births: yellow, orange, white - Deaths/Burials: grey, brown, blue - Reformed Church records: red ## Dataset Summary The **DAI-cards-ATR** dataset includes 553 index cards handwritten or typewritten in French in the XXth century. These cards have been annotated by experts as part of the [DAI-CReTDHI](https://dai-cretdhi.univ-lr.fr/) research project, using Teklia's open-source annotation interface [Callico](https://doc.callico.eu/). ### Split | set | images | | ----- | ------:| | train | 436 | | val | 59 | | test | 58 | ### Languages All the documents in the dataset are written in French. ## Dataset Structure ### Data Instances Each instance represents a single index card with its image and structured transcription in XML format: ```json { "split": "train", "source": "Tours | Index cards", "record_id": "8ba2085e-dad3-47fe-b633-bd312c699056", "record_url": "https://europe.iiif.teklia.com/iiif/2/dai-cretdhi%2FTours%2FAMT-LOTS_EC_NMD%2FEC_LOT_0434%2FFRAC037261_EC_LOT_0434_0146.JPG/0,0,981,594/full/0/default.jpg", "text": "<root><Décès><Défunt><Nom>Hénault</Nom><Prénom>Joseph</Prénom><Sexe>H</Sexe><Âge>36 ans</Âge><LieuDeNaissance>Lerné (Indre-et-Loire)</LieuDeNaissance><Profession>couvreur</Profession><Statut>marié(e)</Statut></Défunt><Conjoint><Nom>Lavy</Nom><Prénom>Elise</Prénom></Conjoint><Père><Nom>Hénault</Nom><Prénom>Auguste</Prénom><Statut>décédé</Statut></Père><Mère><Nom>Aubineau</Nom><Prénom>Auguste</Prénom><Statut>décédée</Statut></Mère></Décès><Année>1874</Année><Mois>septembre</Mois><Jour>14</Jour></root>" } ``` ### Data Fields - `split` (string): Dataset split identifier (train, val, or test) - `source` (string): Source collection ("Tours | Index cards") - `record_id` (string): Unique UUID identifier for the index card in [Arkindex](https://arkindex.teklia.com/) - `record_url` (string): IIIF URL to the image - `text` (`string`): Expert-annotated transcription in XML format containing structured information about the vital record
提供机构:
Teklia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作