five

DrugBank Vocabulary

收藏
Databricks2024-05-09 收录
下载链接:
https://marketplace.databricks.com/details/360b05d3-726a-4da4-9e44-31edc78c75d1/John-Snow-Labs_DrugBank-Vocabulary
下载链接
链接失效反馈
官方服务:
资源简介:
**Overview** DrugBank Vocabulary contains information on DrugBank identifiers, names, and synonyms to permit easy linking and integration into any type of project. DrugBank is a richly annotated resource that combines detailed drug data with comprehensive drug target and drug action information. DrugBank is widely used to facilitate in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. **Description** The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 9591 drug entries including 2037 FDA-approved small molecule drugs, 241 FDA-approved biotech (protein/peptide) drugs, 96 nutraceuticals and over 6000 experimental drugs. Additionally, 4661 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. **Benefits** - drugbank is widely used by the drug industry, medicinal chemists, pharmacists, physicians, students and the general public. - extensive drug and drug-target data - enables the discovery and repurposing of a number of existing drugs to treat rare and newly identified illnesses. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [DrugBank Vocabulary](https://www.johnsnowlabs.com/marketplace/drugbank-vocabulary) - DrugBank is a richly annotated resource that combines drug data with drug target and drug action information. Released in 2006, DrugBank has been widely used to facilitate in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. "Drug Bank Vocabulary" contains information on DrugBank identifiers, names, and synonyms to permit easy linking and integration into any type of project. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

**概述** DrugBank 词汇集(DrugBank Vocabulary)包含DrugBank标识符、名称及同义词信息,可轻松链接并集成至各类项目中。 DrugBank是一类经过精细注释的资源,将详细的药物数据与全面的药物靶点、药物作用信息相结合。该资源被广泛用于辅助虚拟(in silico)药物靶点发现、药物设计、药物对接或筛选、药物代谢预测、药物相互作用预测,以及通用药学教育。 **数据集详情** DrugBank数据库是一类独特的生物信息学与化学信息学资源,将详细的药物(即化学、药理学与药学)数据与全面的药物靶点(即序列、结构与通路)信息相结合。该数据库包含9591条药物条目,其中包括2037种美国食品药品监督管理局(FDA)批准的小分子药物、241种FDA批准的生物技术(蛋白质/肽类)药物、96种营养保健品,以及6000余种试验性药物。此外,有4661条非冗余蛋白质(即药物靶点/酶/转运蛋白/载体)序列与这些药物条目相关联。 **应用优势** - DrugBank被广泛应用于制药行业、药物化学家、药剂师、医师、学生及普通大众群体 - 涵盖丰富的药物与药物靶点数据 - 可助力多款现有药物的发现与再利用,用于治疗罕见病及新发现的疾病 **许可信息** 约翰·斯诺实验室(John Snow Labs)的数据集可免费用于个人及研究用途。若需商业使用,请前往约翰·斯诺实验室官网订阅[数据资源库(Data Library)](https://www.johnsnowlabs.com/marketplace/),订阅后可将约翰·斯诺实验室旗下所有数据集与数据包用于商业场景。 **包含数据集** - [DrugBank 词汇集(DrugBank Vocabulary)](https://www.johnsnowlabs.com/marketplace/drugbank-vocabulary) - DrugBank是一类经过精细注释的资源,将药物数据与药物靶点、药物作用信息相结合。该资源于2006年发布,已被广泛用于辅助虚拟(in silico)药物靶点发现、药物设计、药物对接或筛选、药物代谢预测、药物相互作用预测,以及通用药学教育。“DrugBank 词汇集”包含DrugBank标识符、名称及同义词信息,可轻松链接并集成至各类项目中。 **数据工程概览** **我们提供高质量数据** - 每个数据集均经过三级质量审核: - 由领域专家完成2次人工审核; - 随后通过一套包含60余项验证项的自动化流程,确保所有数据均符合元数据与预设约束条件。 - 数据将被归一化至统一的类型体系: - 所有日期、单位、代码、货币格式均保持统一; - 所有空值均被归一化为统一的标识值; - 所有数据集与字段名称均符合SQL与Hive规范。 - 数据与元数据: - 数据以CSV与Apache Parquet两种格式提供,针对分布式Hadoop、Spark及大规模并行处理(MPP)集群的高读取性能进行了优化; - 元数据采用开放的Frictionless Data标准,其所有字段均经过归一化与验证。 - 数据更新: - 数据更新采用替换式更新机制:过时的外键将被标记为废弃,而非直接删除。 **我们的数据由领域专家进行整理与富集** 本团队由医师、药剂师、公共卫生与医疗计费专家组成,所有数据集均由该团队手动整理: - 字段名称、描述及归一化值均由真正理解其含义的专业人员选定 - 医疗与生命科学专家会为每个数据集添加分类、搜索关键词、描述信息及其他相关内容 - 支持针对临床代码、服务提供方、药物及地理定位信息开展手动与自动化的数据富集工作 - 即便数据源需要通过人工操作才能获取更新,数据也将始终保持最新状态 - 数据集的整理专家将直接为数据订阅者提供技术支持 - 所有数据源的许可协议均经过人工审核,确保可实现免版税的商业使用与再分发。 **需要帮助?** 若您对我们的产品有任何疑问,请发送邮件至[info@johnsnowlabs.com](mailto:info@johnsnowlabs.com)与我们取得联系。
提供机构:
John Snow Labs
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
DrugBank Vocabulary提供药物标识符、名称及同义词的标准化数据,包含9591种药物条目和4661个相关蛋白质序列,支持药物研发与靶点发现。该数据集经过三级质量审核,适用于科研(免费)和商业用途(需授权)。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作