five

Unique Ingredient Identifier|药物成分标识数据集|食品安全数据集

收藏
Databricks2024-05-09 收录
药物成分标识
食品安全
下载链接:
https://marketplace.databricks.com/details/003d9aad-aae3-4ccf-a0dd-a4a19f22e4b9/John-Snow-Labs_Unique-Ingredient-Identifier
下载链接
链接失效反馈
资源简介:
**Overview** This data package contains the details of substances in drugs, biologics, foods and devices registered with a Unique Ingredient Identifier (UNII) through the joint FDA/USP Substance Registration System (SRS). It also contains a list of the names used for each UNII and the changes made to Unique Ingredient Identifiers' (UNIIs) descriptions to the latest update. **Description** The Unique Ingredient Identifier (UNII) is a non-proprietary, free, unique, unambiguous, non-semantic, alphanumeric identifier based on a substance's molecular structure and/or descriptive information. The UNII is: - One of the core components of the United States Federal Medication Terminology. - Used in the FDA's Structured Product Labeling - Used to assist in the generation of the National Library of Medicine's (NLM's) RxNorm. - A US government standard for drug ingredient and food allergen identifiers - A component of the Environmental Protection Agency's Substance Registry System (future) The overall purpose of the joint FDA/USP Substance Registration System (SRS) is to support health information technology initiatives by generating unique ingredient identifiers (UNIIs) for substances in drugs, biologics, foods, and devices. The UNII is a non- proprietary, free, unique, unambiguous, non-semantic, alphanumeric identifier based on a substance’s molecular structure and/or descriptive information. The procedures and management of the SRS is provided by the SRS Board. The SRS Board includes experts from both FDA and USP. The SRS operating procedures defined by the SRS Board are detailed in the SRS Manual. The UNII is a core component of the US Federal Medication Terminology, it is used for product labeling, to assist in the generation of RxNorm, as an identifier for drug ingredients and allergens and in the future will be a component of the Environmental Protection Agency's Substance Registry System. The UII is useful for understanding data contained in NLM's Unified Medical Language System, National Cancer Institute Enterprise Vocabulary Service, FDA Data Standards Council website, VA National Drug File Reference Terminology, FDA Inactive Ingredient Query Application and, proximately, USP Dictionary of USAN and International Drug Names. **Benefits** - The overall purpose of the joint fda/usp substance registration system (srs) is to support health information technology initiatives by generating unique ingredient identifiers (uniis) for substances in drugs, biologics, foods, and devices. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Unique Ingredient Identifier Changes](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-changes) - This dataset displays the changes made to Unique Ingredient Identifiers' (UNIIs) descriptions to the lastest update (2019). Content in this dataset is related to the Unique Ingredient Identifier Records dataset. - [Unique Ingredient Identifier Names](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-names) - This dataset contains a list of the names used for each UNII (Unique Ingredient Identifier). Contents on this dataset are related to the UNII Records dataset. - [Unique Ingredient Identifier Records](https://www.johnsnowlabs.com/marketplace/unique-ingredient-identifier-records) - This dataset contains the details of substances in drugs, biologics, foods and devices registered with a Unique Ingredient Identifier (UNII) through the joint FDA/USP Substance Registration System (SRS). **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).
提供机构:
John Snow Labs
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

中国1km分辨率逐月降水量数据集(1901-2023)

该数据集为中国逐月降水量数据,空间分辨率为0.0083333°(约1km),时间为1901.1-2023.12。数据格式为NETCDF,即.nc格式。该数据集是根据CRU发布的全球0.5°气候数据集以及WorldClim发布的全球高分辨率气候数据集,通过Delta空间降尺度方案在中国降尺度生成的。并且,使用496个独立气象观测点数据进行验证,验证结果可信。本数据集包含的地理空间范围是全国主要陆地(包含港澳台地区),不含南海岛礁等区域。为了便于存储,数据均为int16型存于nc文件中,降水单位为0.1mm。 nc数据可使用ArcMAP软件打开制图; 并可用Matlab软件进行提取处理,Matlab发布了读入与存储nc文件的函数,读取函数为ncread,切换到nc文件存储文件夹,语句表达为:ncread (‘XXX.nc’,‘var’, [i j t],[leni lenj lent]),其中XXX.nc为文件名,为字符串需要’’;var是从XXX.nc中读取的变量名,为字符串需要’’;i、j、t分别为读取数据的起始行、列、时间,leni、lenj、lent i分别为在行、列、时间维度上读取的长度。这样,研究区内任何地区、任何时间段均可用此函数读取。Matlab的help里面有很多关于nc数据的命令,可查看。数据坐标系统建议使用WGS84。

国家青藏高原科学数据中心 收录

开源PHM数据集

本文分享了一个全球各大学、研究机构和公司捐赠的PHM(Prognostics and Health Management)开源数据集,涵盖加工制造、轨道交通、能源电力和半导体等行业的多种场景,包含部件级、设备级和产线级数据。用户可以利用这些数据开发智能分析和建模算法,数据集分类包括故障诊断、健康评估和寿命预测。

github 收录

中国高分辨率高质量PM2.5数据集(2000-2023)

ChinaHighPM2.5数据集是中国高分辨率高质量近地表空气污染物数据集(ChinaHighAirPollutants, CHAP)中PM2.5数据集。该数据集利用人工智能技术,使用模式资料填补了卫星MODIS MAIAC AOD产品的空间缺失值,结合地基观测、大气再分析和排放清单等大数据生产得到2000年至今全国无缝隙地面PM2.5数据。数据十折交叉验证决定系数R2为0.92,均方根误差RMSE为10.76 µg/m3。主要范围为整个中国地区,空间分辨率为1 km,时间分辨率为日、月、年,单位为µg/m3。注意:该数据集持续更新,如需要更多数据,请发邮件联系作者(weijing_rs@163.com; weijing@umd.edu)。 数据文件中包含NC转GeoTiff的四种代码(Python、Matlab、IDL和R语言)nc2geotiff codes。

国家青藏高原科学数据中心 收录

全国景区数据

  中华人民共和国旅游景区质量等级共分为五级,从高到低依次为AAAAA、AAAA、AAA、AA、A级五级。5A级景区代表着中国的世界级精品旅游风景区等级。  CnOpenData汇总整理了全国31个省份及直辖市的景区信息,涵盖了景区名称、省份、景区级别、地址、经纬度、简介等字段,为相关研究助力!

CnOpenData 收录

Google Scholar

Google Scholar是一个学术搜索引擎,旨在检索学术文献、论文、书籍、摘要和文章等。它涵盖了广泛的学科领域,包括自然科学、社会科学、艺术和人文学科。用户可以通过关键词搜索、作者姓名、出版物名称等方式查找相关学术资源。

scholar.google.com 收录