five

RIMES Dataset|手写识别数据集|文档索引数据集

收藏
paperswithcode.com2025-03-22 收录
手写识别
文档索引
下载链接:
https://paperswithcode.com/dataset/rimes
下载链接
链接失效反馈
资源简介:
The RIMES database (Reconnaissance et Indexation de données Manuscrites et de fac similÉS / Recognition and Indexing of handwritten documents and faxes) was created to evaluate automatic systems of recognition and indexing of handwritten letters. Of particular interest are cases such as those sent by postal mail or fax by individuals to companies or administrations. The database was collected by asking volunteers to write handwritten letters in exchange of gift vouchers. Volunteer were given a fictional identity (same sex as the real one) and up to 5 scenarios. Each scenario has been chosen among 9 realistic following themes : change of personal information (address, bank account), information request, opening and closing (customer account), modification of contract or order, complaint (bad service quality…), payment difficulties (asking for a delay, tax exemption…), reminder letter, damage declaration with further circumstances and a destination (administrations or service providers (telephone, power, bank, insurances). The volunteers composed a letter with those pieces of information using their own words. The layout was free and it was only asked to use white paper and to write in a readable way with black ink. The collect was a success with more than 1,300 people who have participated to the RIMES database creation by writing up to 5 mails. The RIMES database thus obtained contains 12,723 pages corresponding to 5605 mails of two to three pages.

RIMES数据库( Reconnaissance et Indexation de données Manuscrites et de fac similÉS / Recognition and Indexing of handwritten documents and faxes)旨在评估自动识别与索引手写信件的系统。其中,特别关注诸如个人通过邮政邮件或传真向公司或政府机构发送的案例。数据库的收集是通过请求志愿者以礼品券作为交换条件书写手写信件而完成的。志愿者被赋予一个虚构的身份(性别与真实身份相同)以及至多5个情景。每个情景均从9个现实主题中选择:个人信息变更(地址、银行账户)、信息查询、开户与销户、合同或订单修改、投诉(服务质量差等)、支付困难(要求延期、免税等)、提醒信、损失声明及进一步说明和目的地(政府机构或服务提供商(电话、电力、银行、保险))。志愿者使用自己的语言组合这些信息撰写信件。布局自由,仅要求使用白色纸张并以黑色墨水清晰书写。收集工作取得了成功,超过1300人参与了RIMES数据库的创建,每人撰写了多达5封邮件。因此,RIMES数据库包含12,723页,对应5605封两至三页的信件。
提供机构:
Papers with Code
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4099个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

giovannidemuri__sharegpt-ex50000-seed5_llama8b-er-v573-seed2-hx_256_ngt0.7_tp0.9

该数据集包含了用户与助手之间的对话,其中包含两个字段:用户发言和助手回应,均为字符串类型。训练集大小为38646852字节,共有44096条对话记录。

huggingface 收录

中国近海台风路径集合数据集(1945-2024)

1945-2024年度,中国近海台风路径数据集,包含每个台风的真实路径信息、台风强度、气压、中心风速、移动速度、移动方向。 数据源为获取温州台风网(http://www.wztf121.com/)的真实观测路径数据,经过处理整合后形成文件,如使用csv文件需使用文本编辑器打开浏览,否则会出现乱码,如要使用excel查看数据,请使用xlsx的格式。

国家海洋科学数据中心 收录

China Family Panel Studies (CFPS)

Please visit CFPS official data platform to download the newest data, WeChat official account of CFPS: ISSS_CFPS. The CFPS 2010 baseline survey conducted face-to-face interviews with the sampled households’ family members who live in the sample communities. It also interviewed those family members who were elsewhere in the same county. For those who were not present at home at the time of interview, basic information was collected from their family members at presence. All family members who had blood/marital/adoptive ties with the household were identified as permanent respondents. Prospective family members including new-borns and adopted children.

DataCite Commons 收录

Photovoltaic power plant data

包括经纬度、电源板模型、NWP等信息。

github 收录

中国1km分辨率年降水量数据(1901-2023年)

中国1km分辨率年降水量数据(1901-2023年)根据西北农林科技大学彭守璋研究员团队研制的1901-2023年中国1km分辨率逐月降水量数据集进行年度累加合成后除以10进行单位换算后得到。数据包含多个TIF文件,每个TIF文件为对应年份的年累加降水量,降水量单位为mm。彭守璋研究员在《Earth System Science Data》以论文形式发布了1 km monthly temperature and precipitation dataset for China from 1901 to 2017数据。论文链接https://doi.org/10.5194/essd-11-1931-2019。

国家地球系统科学数据中心 收录