Unique Ingredient Identifier
收藏Databricks2024-05-09 收录
下载链接:
https://marketplace.databricks.com/details/003d9aad-aae3-4ccf-a0dd-a4a19f22e4b9/John-Snow-Labs_Unique-Ingredient-Identifier
下载链接
链接失效反馈资源简介:
**Overview**
This data package contains the details of substances in drugs, biologics, foods and devices registered with a Unique Ingredient Identifier (UNII) through the joint FDA/USP Substance Registration System (SRS). It also contains a list of the names used for each UNII and the changes made to Unique Ingredient Identifiers' (UNIIs) descriptions to the latest update.
**Description**
The Unique Ingredient Identifier (UNII) is a non-proprietary, free, unique, unambiguous, non-semantic, alphanumeric identifier based on a substance's molecular structure and/or descriptive information. The UNII is:
- One of the core components of the United States Federal Medication Terminology.
- Used in the FDA's Structured Product Labeling
- Used to assist in the generation of the National Library of Medicine's (NLM's) RxNorm.
- A US government standard for drug ingredient and food allergen identifiers
- A component of the Environmental Protection Agency's Substance Registry System (future) The overall purpose of the joint FDA/USP Substance Registration System (SRS) is to support health information technology initiatives by generating unique ingredient identifiers (UNIIs) for substances in drugs, biologics, foods, and devices.
The UNII is a non- proprietary, free, unique, unambiguous, non-semantic, alphanumeric identifier based on a substance’s molecular structure and/or descriptive information. The procedures and management of the SRS is provided by the SRS Board. The SRS Board includes experts from both FDA and USP. The SRS operating procedures defined by the SRS Board are detailed in the SRS Manual. The UNII is a core component of the US Federal Medication Terminology, it is used for product labeling, to assist in the generation of RxNorm, as an identifier for drug ingredients and allergens and in the future will be a component of the Environmental Protection Agency's Substance Registry System. The UII is useful for understanding data contained in NLM's Unified Medical Language System, National Cancer Institute Enterprise Vocabulary Service, FDA Data Standards Council website, VA National Drug File Reference Terminology, FDA Inactive Ingredient Query Application and, proximately, USP Dictionary of USAN and International Drug Names.
**Benefits**
- The overall purpose of the joint fda/usp substance registration system (srs) is to support health information technology initiatives by generating unique ingredient identifiers (uniis) for substances in drugs, biologics, foods, and devices.
**License Information**
The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes.
**Included Datasets**
- [Unique Ingredient Identifier Changes](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-changes)
- This dataset displays the changes made to Unique Ingredient Identifiers' (UNIIs) descriptions to the lastest update (2019). Content in this dataset is related to the Unique Ingredient Identifier Records dataset.
- [Unique Ingredient Identifier Names](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-names)
- This dataset contains a list of the names used for each UNII (Unique Ingredient Identifier). Contents on this dataset are related to the UNII Records dataset.
- [Unique Ingredient Identifier Records](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-records)
- This dataset contains the details of substances in drugs, biologics, foods and devices registered with a Unique Ingredient Identifier (UNII) through the joint FDA/USP Substance Registration System (SRS).
**Data Engineering Overview**
**We deliver high-quality data**
- Each dataset goes through 3 levels of quality review
- 2 Manual reviews are done by domain experts
- Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints
- Data is normalized into one unified type system
- All dates, unites, codes, currencies look the same
- All null values are normalized to the same value
- All dataset and field names are SQL and Hive compliant
- Data and Metadata
- Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters
- Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated
- Data Updates
- Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted
**Our data is curated and enriched by domain experts**
Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts:
- Field names, descriptions, and normalized values are chosen by people who actually understand their meaning
- Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset
- Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations
- The data is always kept up to date – even when the source requires manual effort to get updates
- Support for data subscribers is provided directly by the domain experts who curated the data sets
- Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution.
**Need Help?**
If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).
**概览**
本数据集包包含通过美国食品药品监督管理局(Food and Drug Administration, FDA)与美国药典(United States Pharmacopeia, USP)联合建立的物质注册系统(Substance Registration System, SRS)获得唯一物质标识符(Unique Ingredient Identifier, UNII)的药品、生物制品、食品及医疗器械中的物质详情,同时收录各UNII对应的名称列表,以及截至最新更新版本的UNII描述变更记录。
**详细说明**
唯一物质标识符(Unique Ingredient Identifier, UNII)是基于物质分子结构及/或描述性信息生成的非专有、免费、唯一、无歧义、无语义的字母数字标识符。UNII具备以下属性与用途:
- 美国联邦医药术语体系的核心组成部分之一;
- 用于美国FDA的结构化产品标签(Structured Product Labeling);
- 助力美国国立医学图书馆(National Library of Medicine, NLM)的RxNorm标准生成;
- 美国政府用于药品成分与食品过敏原识别的官方标准;
- 未来将纳入美国环境保护署(Environmental Protection Agency)物质注册系统。
FDA与USP联合建立的物质注册系统(SRS)的整体目标是,为药品、生物制品、食品及医疗器械中的物质生成唯一物质标识符(UNII),以支持健康信息技术相关项目落地。
UNII是基于物质分子结构及/或描述性信息生成的非专有、免费、唯一、无歧义、无语义的字母数字标识符。物质注册系统(SRS)的流程与管理由SRS委员会负责,该委员会由FDA与USP的专家共同组成。SRS委员会制定的SRS操作细则详见《SRS手册》。UNII作为美国联邦医药术语体系的核心组成部分,可用于产品标签、辅助生成RxNorm标准、作为药品成分与过敏原的识别标识,未来还将纳入美国环境保护署的物质注册系统。UNII可用于解读美国国立医学图书馆统一医学语言系统、美国国家癌症研究所企业词汇服务、FDA数据标准委员会官网、美国退伍军人事务部国家药品文件参考术语、FDA非活性成分查询应用程序,以及近期的USP美国采用药名(USAN)与国际药品名称词典中的数据。
**收益**
FDA与USP联合物质注册系统(SRS)的整体目标是,为药品、生物制品、食品及医疗器械中的物质生成唯一物质标识符(UNII),以支持健康信息技术相关项目落地。
**授权信息**
约翰·斯诺实验室(John Snow Labs)数据集仅供个人及研究用途免费使用。若需商业使用,请前往约翰·斯诺实验室官网订阅[数据资源库](https://www.johnsnowlabs.com/marketplace/),订阅后可使用所有约翰·斯诺实验室的数据集与数据包进行商业用途。
**包含数据集**
- [唯一物质标识符变更记录](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-changes)
- 本数据集展示截至2019年最新更新版本的唯一物质标识符(UNII)描述变更内容,与唯一物质标识符记录数据集相关联。
- [唯一物质标识符名称列表](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-names)
- 本数据集收录各UNII对应的名称列表,与唯一物质标识符记录数据集相关联。
- [唯一物质标识符记录](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-records)
- 本数据集包含通过FDA与USP联合物质注册系统(SRS)获得UNII的药品、生物制品、食品及医疗器械中的物质详情。
**数据工程概览**
**我们提供高质量数据**
- 每份数据集均经过三级质量审核:
- 由领域专家完成2次人工审核;
- 随后通过包含60余项验证规则的自动化流程,确保所有数据符合元数据定义与约束条件。
- 数据已归一化为统一类型体系:
所有日期、单位、编码、货币格式均保持统一;
所有空值均归一化为统一标准值;
所有数据集与字段名称均符合SQL与Hive命名规范。
- 数据与元数据:
数据以CSV与Apache Parquet两种格式提供,针对分布式Hadoop、Spark及大规模并行处理(MPP)集群的高读取性能进行了优化;
元数据采用开放的无摩擦数据(Frictionless Data)标准,所有字段均经过归一化与验证。
- 数据更新:
数据更新采用替换更新模式:过时的外键将被标记为废弃,而非直接删除。
**我们的数据由领域专家精心整理与丰富完善**
本数据集均由我们的医师、药师、公共卫生与医疗计费专家团队手动整理:
- 字段名称、描述与归一化值均由真正理解其含义的专业人员选定;
- 医疗健康与生命科学领域专家为每份数据集添加分类、搜索关键词、描述等补充信息;
- 针对临床编码、服务提供者、药品及地理定位信息,同时支持人工与自动化的数据丰富流程;
- 始终保持数据实时更新,即便数据源需要人工投入才能获取更新内容;
- 为数据订阅者提供直接由整理该数据集的领域专家提供的技术支持;
- 对每个数据源的授权协议均进行人工核验,以确保可免版权费进行商业使用与再分发。
**需要帮助?**
若您对我们的产品有疑问,请发送邮件至[info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).
提供机构:
John Snow Labs
AI搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集提供了通过FDA/USP物质注册系统(SRS)注册的唯一成分标识符(UNII)的详细信息,包括药品、生物制品、食品和器械中的物质。UNII是一种基于分子结构或描述信息的非专有、唯一标识符,用于支持健康信息技术倡议,如药物标签和RxNorm生成。数据集包含三个子集:UNII记录、UNII名称和UNII变更,经过专家审核和质量验证,确保数据准确性和一致性。
以上内容由AI搜集并总结生成



