CHIP2022-医疗清单发票OCR要素提取任务
收藏阿里云天池2026-06-09 更新2024-03-07 收录
下载链接:
https://tianchi.aliyun.com/dataset/131815
下载链接
链接失效反馈官方服务:
资源简介:
当前医院中使用的病历材料依然以纸质为主,其中信息包含:客户信息,诊断信息,用药信息,费用信息等。在医疗行业、保险行业中,这些信息具有很高的商业及科研价值,且提取难度较高,目前还多依赖人工录入。<br/>
随着OCR与NLP等人工智能技术在生产生活中的应用的逐渐发展普及,与传统人工录入相比,利用OCR及NLP技术的应用可以有效提升工作效率,并降低业务人员的培养成本。利用OCR及NLP技术将这些纸质材料上的信息进行电子化、结构化逐渐成为当前行业中的热点。<br/>
本次任务数据集中包括:门诊发票,住院发票,购药税票,出院小结这四类病历材料。主要针对需求:生活场景图片,提取数据,并生成电子结构化数据。
Currently, medical record documents used in hospitals are still predominantly paper-based, containing information such as patient information, diagnostic information, medication information, billing information and other related contents. In the healthcare and insurance industries, this information possesses high commercial and research value, but it is difficult to extract, and most of the extraction work currently relies on manual entry.<br/>With the gradual development and popularization of artificial intelligence technologies such as OCR and NLP in production and daily life, compared with traditional manual entry, the application of OCR and NLP technologies can effectively improve work efficiency and reduce the training costs of industry practitioners. Using OCR and NLP technologies to digitize and structure the information on these paper-based documents has gradually become a hot topic in the current industry.<br/>This task dataset includes four types of medical record documents: outpatient invoices, inpatient invoices, medication purchase tax receipts and discharge summaries. The core requirement is to extract data from images taken in real-life scenarios and generate structured electronic data.
提供机构:
阿里云天池
创建时间:
2022-06-06
搜集汇总
数据集介绍

背景与挑战
背景概述
CHIP2022-医疗清单发票OCR要素提取任务数据集包含四类医疗病历材料(门诊发票、住院发票、购药税票、出院小结),旨在通过OCR和NLP技术提取关键信息并生成结构化数据。数据集提供了详细的字段说明、标注示例和评价标准,适用于医疗和保险行业的技术研究和应用。
以上内容由遇见数据集搜集并总结生成



