Teklia/DAI-CReTDHI-IndexCards-KIE
收藏Hugging Face2026-02-12 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/Teklia/DAI-CReTDHI-IndexCards-KIE
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: split
dtype: string
- name: source
dtype: string
- name: record_id
dtype: string
- name: record_url
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 315073
num_examples: 436
- name: val
num_bytes: 39281
num_examples: 59
- name: test
num_bytes: 40939
num_examples: 58
download_size: 123738
dataset_size: 395293
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: val
path: data/val-*
- split: test
path: data/test-*
size_categories: n<1K
language:
- fr
tags:
- atr
- htr
- ocr
- modern
- handwritten
- printed
annotations_creators:
- expert-generated
license: mit
task_categories:
- image-to-text
---
# DAI-cards-ATR - Page level
## Dataset Description
- **Homepage:** [DAI-CReTDHI](https://dai-cretdhi.univ-lr.fr/)
- **Source:** [Archives Municipales de Tours](https://www.tours.fr/page-portail-ma-mairie/services-pratiques/offre-culturelle/patrimoine-histoire-archives/archives-municipales/)
- **Point of Contact:** [TEKLIA](https://teklia.com)
This dataset comprises structured index cards created by archivists at the Municipal Archives of Tours through systematic examination and transcription of parish and civil registers. These cards serve as archival research tools, documenting baptisms, marriages, and burials with detailed personal, familial, and socio-professional information.
The cards follow a color-based classification system:
- Marriages: pink, mauve, purple
- Baptisms/Births: yellow, orange, white
- Deaths/Burials: grey, brown, blue
- Reformed Church records: red
## Dataset Summary
The **DAI-cards-ATR** dataset includes 553 index cards handwritten or typewritten in French in the XXth century.
These cards have been annotated by experts as part of the [DAI-CReTDHI](https://dai-cretdhi.univ-lr.fr/) research project, using Teklia's open-source annotation interface [Callico](https://doc.callico.eu/).
### Split
| set | images |
| ----- | ------:|
| train | 436 |
| val | 59 |
| test | 58 |
### Languages
All the documents in the dataset are written in French.
## Dataset Structure
### Data Instances
Each instance represents a single index card with its image and structured transcription in XML format:
```json
{
"split": "train",
"source": "Tours | Index cards",
"record_id": "8ba2085e-dad3-47fe-b633-bd312c699056",
"record_url": "https://europe.iiif.teklia.com/iiif/2/dai-cretdhi%2FTours%2FAMT-LOTS_EC_NMD%2FEC_LOT_0434%2FFRAC037261_EC_LOT_0434_0146.JPG/0,0,981,594/full/0/default.jpg",
"text": "<root><Décès><Défunt><Nom>Hénault</Nom><Prénom>Joseph</Prénom><Sexe>H</Sexe><Âge>36 ans</Âge><LieuDeNaissance>Lerné (Indre-et-Loire)</LieuDeNaissance><Profession>couvreur</Profession><Statut>marié(e)</Statut></Défunt><Conjoint><Nom>Lavy</Nom><Prénom>Elise</Prénom></Conjoint><Père><Nom>Hénault</Nom><Prénom>Auguste</Prénom><Statut>décédé</Statut></Père><Mère><Nom>Aubineau</Nom><Prénom>Auguste</Prénom><Statut>décédée</Statut></Mère></Décès><Année>1874</Année><Mois>septembre</Mois><Jour>14</Jour></root>"
}
```
### Data Fields
- `split` (string): Dataset split identifier (train, val, or test)
- `source` (string): Source collection ("Tours | Index cards")
- `record_id` (string): Unique UUID identifier for the index card in [Arkindex](https://arkindex.teklia.com/)
- `record_url` (string): IIIF URL to the image
- `text` (`string`): Expert-annotated transcription in XML format containing structured information about the vital record
提供机构:
Teklia



