five

mukesh1eww/RIMES-2011-line

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mukesh1eww/RIMES-2011-line
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - fr task_categories: - image-to-text pretty_name: RIMES-2011-line dataset_info: features: - name: image dtype: image - name: text dtype: string splits: - name: train num_examples: 10188 - name: validation num_examples: 1138 - name: test num_examples: 778 dataset_size: 12104 tags: - atr - htr - ocr - modern - handwritten --- # RIMES-2011 - line level ## Table of Contents - [RIMES-2011 - line level](#rimes-2011-line-level) - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) ## Dataset Description - **Homepage:** [Zenodo](https://zenodo.org/records/10812725) - **PapersWithCode:** [Papers using the RIMES dataset](https://paperswithcode.com/dataset/rimes) - **Point of Contact:** [TEKLIA](https://teklia.com) ## Dataset Summary The RIMES-2011 database (Recognition and Indexation of handwritten documents and faxes) was created to evaluate automatic recognition and indexing systems for handwritten letters. The database was collected by asking volunteers to write handwritten letters in exchange for gift certificates. Volunteers were given a fictitious identity (same gender as the real one) and up to 5 scenarios. Each scenario was chosen from among 9 realistic topics: change of personal data (address, bank account), request for information, opening and closing (customer account), change of contract or order, complaint (poor quality of service...), payment difficulties (request for delay, tax exemption...), reminder, complaint with other circumstances and a target (administrations or service providers (telephone, electricity, bank, insurance). The volunteers wrote a letter with this information in their own words. The layout was free and the only request was to use white paper and write legibly in black ink. The campaign was a success, with more than 1,300 people contributing to the RIMES database by writing up to 5 letters. The resulting RIMES database contains 12,723 pages, corresponding to 5605 mails of two to three pages each. Note that all images are resized to a fixed height of 128 pixels. ### Languages All the documents in the dataset are written in French. ## Dataset Structure ### Data Instances ``` { 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=2560x128 at 0x1A800E8E190, 'text': 'Comme indiqué dans les conditions particulières de mon contrat d'assurance' } ``` ### Data Fields - `image`: a PIL.Image.Image object containing the image. Note that when accessing the image column (using dataset[0]["image"]), the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0]. - `text`: the label transcription of the image.
提供机构:
mukesh1eww
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作