Human Gene Expression Database
收藏Databricks2024-05-09 收录
下载链接:
https://marketplace.databricks.com/details/860d0a18-9426-477b-8f6a-af15c0d7ecec/John-Snow-Labs_Human-Gene-Expression-Database
下载链接
链接失效反馈官方服务:
资源简介:
**Overview**
This data package contains expression profiles for proteins in normal and cancer tissues. It also contains data on sequence based RNA levels in human tissue and cell line.
**Description**
The Expression Profiles for Proteins in Normal Human Tissues contains quantitative data and images describing the expression and distribution of human proteins across tissues and organs, both on the mRNA and protein level. The protein expression data is derived from annotation of immune-histochemical staining of cell populations in all major human tissues and organs, including the brain, liver, kidney, lymphoid tissues, heart, lung, skin, gastrointestinal tract, pancreas, endocrine tissues and the reproductive organs. In total, 44 different human tissues are included and contain annotation data for altogether 76 different cell types. The antibody-based protein profiles are qualitative and describe the spatial distribution, cell type specificity and the rough relative abundance of proteins in these tissues, whereas the mRNA data provide quantitative data on the average gene expression within an entire tissue. For each gene, the immune-histochemical staining profile, based on a single or multiple antibodies, is matched with mRNA data and gene/protein characterization data to yield an "annotated protein expression" profile. The datasets in this data package cover the Expression profiles for proteins in human normal tissues and tumor tissues based on immunohistochemistry using tissue microarrays. Another dataset covers the RNA (Ribonucleic acid) levels in 56 cell lines and 37 tissues based on RNA-sequence. The data is based on The Human Protein Atlas version 16 and Ensembl version 83.38.
**Benefits**
- sequence based rna levels in human tissue and cell line dataset helps to analyze the continually changing cellular transcriptome.
- facilitates the ability to look at alternative gene spliced transcripts, post-transcriptional modifications, gene fusion, mutations/snps and changes in gene expression.
- the data package also helps to look at and analyze different populations of rna to include total rna, small rna, such as mirna, trna, and ribosomal profiling.
- expression profile dataset helps in the identification of a potential protein signature for each given type of cancer and provides a starting point for further analyses of cancer type-specific proteins.
- because the cancer atlas contains a large number of cancer samples, the available protein profiles provide an excellent starting point for identifying new potential cancer biomarkers.
**License Information**
The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes.
**Included Datasets**
- [Expression Profiles For Proteins in Human Cancer Tissues](https://www.johnsnowlabs.com/marketplace/expression-profiles-for-proteins-in-human-cancer-tissues)
- This dataset covers the staining profiles for proteins in human tumor tissue based on immunohistochemistry using tissue microarrays. The data is based on The Human Protein Atlas version 21.0 and Ensembl version 103.38.
- [Expression Profiles For Proteins in Normal Human Tissues](https://www.johnsnowlabs.com/marketplace/expression-profiles-for-proteins-in-normal-human-tissues)
- This dataset covers the Expression profiles for proteins in human tissues based on immunohistochemistry using tissue microarrays. The data is based on The Human Protein Atlas version 21.1 and Ensembl version 103.38.
- [Sequence Based RNA Levels in Human Tissue and Cell Line](https://www.johnsnowlabs.com/marketplace/sequence-based-rna-levels-in-human-tissue-and-cell-line)
- This dataset covers the RNA (Ribonucleic acid) levels in 56 cell lines and 37 tissues based on RNA-sequence. The data is based on The Human Protein Atlas version 21.0 and Ensembl version 103.38.
**Data Engineering Overview**
**We deliver high-quality data**
- Each dataset goes through 3 levels of quality review
- 2 Manual reviews are done by domain experts
- Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints
- Data is normalized into one unified type system
- All dates, unites, codes, currencies look the same
- All null values are normalized to the same value
- All dataset and field names are SQL and Hive compliant
- Data and Metadata
- Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters
- Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated
- Data Updates
- Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted
**Our data is curated and enriched by domain experts**
Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts:
- Field names, descriptions, and normalized values are chosen by people who actually understand their meaning
- Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset
- Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations
- The data is always kept up to date – even when the source requires manual effort to get updates
- Support for data subscribers is provided directly by the domain experts who curated the data sets
- Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution.
**Need Help?**
If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).
提供机构:
John Snow Labs
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集整合了人类正常与癌变组织的蛋白质表达谱及RNA序列数据,覆盖44种组织和76种细胞类型,源自人类蛋白质图谱和Ensembl数据库。数据包含免疫组化染色结果和定量mRNA信息,适用于癌症标志物发现和转录组分析研究。
以上内容由遇见数据集搜集并总结生成



