five

Clinical deidentification for PDF

收藏
Snowflake2024-12-15 更新2024-12-17 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZTYZ4386LJAS
下载链接
链接失效反馈
官方服务:
资源简介:
The Clinical De-Identification model is designed to recognize and anonymize PHI in English-language clinical notes. It employs state-of-the-art natural language processing techniques to detect sensitive information such as patient names, addresses, medical record numbers, and other identifiers. Once identified, the PHI is effectively masked or obfuscated, rendering the text safe for broader use while maintaining its informational integrity. Up to 1.9M chars/hour; 0.0018 USD/100 processed chars. **Key Features:** - The model is finely tuned to identify a wide range of PHI elements in medical texts, ensuring comprehensive de-identification. - The de-identification process aligns with HIPAA and other healthcare privacy regulations, aiding in legal compliance and data protection. - Ideal for research, analytics, and training purposes, this model enables the safe utilization of medical texts without compromising patient privacy. <p><br/></p> Covered entities: AGE, BIOID, CITY, COUNTRY, DATE, DEVICE, DOCTOR, EMAIL, FAX, HEALTHPLAN, HOSPITAL, IDNUM, LOCATION, MEDICALRECORD, ORGANIZATION, PATIENT, PHONE, PROFESSION, STATE, STREET, URL, USERNAME, ZIP, ACCOUNT, LICENSE, VIN, SSN, DLN, PLATE, and IPADDR. <p><br/></p> <p><br/></p>
提供机构:
John Snow Labs
创建时间:
2024-10-29
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集提供临床PDF文档去标识化模型,用于识别和匿名化英文临床笔记中的受保护健康信息(PHI),覆盖年龄、地址、医疗记录号等多种实体。模型符合HIPAA隐私法规,支持研究等用途,处理速度达每小时190万字符,成本为每10万字符0.0018美元。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作