Comparative Toxicogenomics Database
收藏Databricks2024-05-09 收录
下载链接:
https://marketplace.databricks.com/details/dc6033f7-de14-4d54-a5e4-df017af674b2/John-Snow-Labs_Comparative-Toxicogenomics-Database
下载链接
链接失效反馈官方服务:
资源简介:
**Overview**
This data package contains Comparative Toxicogenomics Database (CTD) datasets, providing details regarding relationships between genes, chemicals and diseases and the significance of these inferences.
**Description**
The Comparative Toxicogenomics Database purpose is to provide a tool to generate new hypotheses on the mechanism of chemicals in the development of diseases by collecting curated data reported in the scientific literature on chemicals, genes and diseases and making inferences on the relationships of these three elements. This data package contains Comparative Toxicogenomics Database (CTD) datasets, providing details regarding relationships between genes, chemicals and diseases and the significance of these inferences.
**Benefits**
- helps curating scientific data describing relationships between chemicals, genes/proteins, diseases, taxa, phenotypes, go annotations, pathways, and interaction modules
- comparative toxicogenomics database is normalized according to jsl standards and split up into 14 datasets category wise.
**License Information**
The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes.
**Included Datasets**
- [Chemical Disease Associations](https://www.johnsnowlabs.com/marketplace/chemical-disease-associations)
- This dataset contains the relationships between chemicals and diseases. These relationships were inferred due to the fact that the chemical and the disease in some way share independent relationships with a same gene or group of genes; the inference was made through curation of research publications, the building of diagrams and statistical analysis.
- [Chemical Gene Interaction Types](https://www.johnsnowlabs.com/marketplace/chemical-gene-interaction-types)
- This dataset contains the terms of the vocabulary used in the Comparative Toxicogenomics Database (CTD) to describe the type of relationship between the chemical and the gene, these terms define different effects of the chemical over the gene or the products of a gene at a molecular level. The vocabulary used in the CTD is organized in a hierarchical manner, for which many of the terms have a parent term and at the same time could have a descendant term.
- [Chemical Gene Interactions](https://www.johnsnowlabs.com/marketplace/chemical-gene-interactions)
- This dataset is a list of research publications, curated in the Comparative Toxicogenomics Database (CTD), that show evidence of interaction between a chemical and a gene. The data collect the type of interaction, the degree of such and what element of the gene is affected. The studies listed were performed in human beings and in other species.
- [Chemical Gene Ontology Enriched Associations](https://www.johnsnowlabs.com/marketplace/chemical-gene-ontology-enriched-associations)
- This dataset contains the results of Gene Ontology (GO) enrichment analyses performed for groups of genes that are in some way affected by a chemical. This analysis was done using the tool GO-TermFinder resulting in GO terms shared between the genes, creating information used to inferences in the biological processes, molecular functions or cellular components that might be involved in the effect of the chemical over the genes and/or the mechanism of disease.
- [Chemical Pathway Enriched Associations](https://www.johnsnowlabs.com/marketplace/chemical-pathway-enriched-associations)
- This dataset contains the results of pathway enrichment analyses performed for gene groups that are in some way affected by a chemical. The enrichment analysis system permits to find common pathways between genes. Each analysis is based in the list of genes related to the specified chemical, resulting in the output of pathways shared between the genes, creating information used in the biological processes that might be involved in the effect of the chemical over the genes or disease mechanism.
- [Chemical Pathway Vocabulary](https://www.johnsnowlabs.com/marketplace/chemical-pathway-vocabulary)
- This dataset contains the terms of the vocabulary used in the Comparative Toxicogenomics Database (CTD) to describe the biological or pathological process or reaction (pathway) involved in the interaction between a chemical and disease or gene. These pathways are identified through their codes in the KEGG and REACTOME databases. This dataset is part of the CTD and can be better understood and used along with the other datasets that belong to the CTD.
- [Chemical Vocabulary](https://www.johnsnowlabs.com/marketplace/chemical-vocabulary)
- This dataset contains the terms of the vocabulary, organized in a hierarchical manner, used in the Comparative Toxicogenomics Database (CTD) to describe the chemicals inferred to have an interaction over a gene or disease. The dataset contains different types of standardized identifications for the chemical to provide a cross-platform compatibility making able to identify the chemical in major scientific databases.
- [Disease Pathway Associations](https://www.johnsnowlabs.com/marketplace/disease-pathway-associations)
- This dataset contains the relationships between biological pathways and diseases. These relationships were inferred due to the fact that the pathway and the disease in some way share independent relationships with a same gene or group of genes; the inference was made through curation of research publications, the building of diagrams and statistical analysis.
- [Exposure Event Associations](https://www.johnsnowlabs.com/marketplace/exposure-event-associations)
- This dataset is from the exposure science module added in 2016 to the Comparative Toxicogenomics Database (CTD). The Exposure Events Database collects the effects of chemicals in the environment onto human biology and the measurable events resulting from the exposure to the chemical. The data displayed is a list of research studies with details on the exposure and the outcome of such.
- [Gene Disease Associations](https://www.johnsnowlabs.com/marketplace/gene-disease-associations)
- This dataset contains the relationships between genes and diseases. These relationships were inferred due to the fact that the gene and the disease in some way share independent relationships with the same chemical; the inference was made through curation of research publications, the building of diagrams and statistical analysis.
- [Gene Expression Vocabulary](https://www.johnsnowlabs.com/marketplace/gene-expression-vocabulary)
- This dataset contains the terms of the vocabulary used in the Comparative Toxicogenomics Database (CTD) to describe the activity of genes inferred to have an interaction with a chemical or disease. The dataset contains different types of standardized identifications for the gene to provide a cross-platform compatibility making able to identify the gene and its characteristics in major scientific databases.
- [Gene Ontology Disease Gene Inference Networks](https://www.johnsnowlabs.com/marketplace/gene-ontology-disease-gene-inference-networks)
- This dataset from the Comparative Toxicogenomics Database (CTD) contains the relationships between gene ontology terms and diseases. These relationships were inferred due to the fact that the gene ontology term and the disease in some way share independent relationships with a same gene or group of genes; the inference was made through curation of research publications, the building of diagrams and statistical analysis.
- [Gene Pathways](https://www.johnsnowlabs.com/marketplace/gene-pathways)
- This dataset contains the biological pathways on which the analyzed genes are part of. These pathways are useful on inferring relationships between chemicals, genes and diseases when the gene and the disease or chemical in some way share independent relationships with the same pathway.
- [Health Diseases Vocabulary](https://www.johnsnowlabs.com/marketplace/health-diseases-vocabulary)
- This dataset contains the terms of the vocabulary, organized in a hierarchical manner, used in the Comparative Toxicogenomics Database (CTD) to describe the diseases inferred to have a relationship with a gene or chemical. The dataset contains different types of standardized identifications for the disease to provide a cross-platform compatibility making able to identify the chemical in major scientific databases.
**Data Engineering Overview**
**We deliver high-quality data**
- Each dataset goes through 3 levels of quality review
- 2 Manual reviews are done by domain experts
- Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints
- Data is normalized into one unified type system
- All dates, unites, codes, currencies look the same
- All null values are normalized to the same value
- All dataset and field names are SQL and Hive compliant
- Data and Metadata
- Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters
- Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated
- Data Updates
- Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted
**Our data is curated and enriched by domain experts**
Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts:
- Field names, descriptions, and normalized values are chosen by people who actually understand their meaning
- Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset
- Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations
- The data is always kept up to date – even when the source requires manual effort to get updates
- Support for data subscribers is provided directly by the domain experts who curated the data sets
- Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution.
**Need Help?**
If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).
提供机构:
John Snow Labs
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据库整合了化学物质、基因和疾病之间的关联数据,通过科学文献整理和统计推断揭示三者关系,包含14个标准化子数据集,涵盖相互作用类型、通路分析等研究领域。数据经过专家人工审核和60多项自动化验证,确保高质量和标准化输出。
以上内容由遇见数据集搜集并总结生成



