Gene and miRNA Annotations
收藏Databricks2024-05-09 收录
下载链接:
https://marketplace.databricks.com/details/1a6d9b73-1e7e-4854-b359-fd65c16ce240/John-Snow-Labs_Gene-and-miRNA-Annotations
下载链接
链接失效反馈官方服务:
资源简介:
**Overview**
This data package contains dataset on microRNA sequences and families with annotations and dataset on human genes and their miRNA annotations.
**Description**
This data package contains datasets on gene and miRNA annotations. The Gene and miRNA Family Annotations dataset describe 9991 microRNA sequences and families with annotations for Seed+m8, Species ID, miRBase ID, Mature Sequence, Family Conservation, and miRBase Accession. The Human Gene Information and miRNA Annotations dataset describe Information about 28,353 human genes and their miRNA annotations together with their Transcript ID, Gene ID, Gene symbol, Gene description, Species ID, Number of 3P-seq tags + 5 and Representative transcripts.
**Benefits**
- This data package can be useful for further gene research and genetic studies.
**License Information**
The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes.
**Included Datasets**
- [Gene and miRNA Family Annotations](https://www.johnsnowlabs.com/marketplace/gene-and-mirna-family-annotations)
- This dataset describes 9991 microRNA sequences and families with annotations for Seed+m8, Species ID, miRBase ID, Mature Sequence, Family Conservation, and miRBase Accession.
- [Human Gene Information and miRNA Annotations](https://www.johnsnowlabs.com/marketplace/human-gene-information-and-mirna-annotations)
- This dataset describes Information about 28,353 human genes and their miRNA annotations together with their Transcript ID, Gene ID, Gene symbol, Gene description, Species ID, Number of 3P-seq tags + 5 and Representative transcripts.
**Data Engineering Overview**
**We deliver high-quality data**
- Each dataset goes through 3 levels of quality review
- 2 Manual reviews are done by domain experts
- Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints
- Data is normalized into one unified type system
- All dates, unites, codes, currencies look the same
- All null values are normalized to the same value
- All dataset and field names are SQL and Hive compliant
- Data and Metadata
- Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters
- Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated
- Data Updates
- Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted
**Our data is curated and enriched by domain experts**
Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts:
- Field names, descriptions, and normalized values are chosen by people who actually understand their meaning
- Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset
- Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations
- The data is always kept up to date – even when the source requires manual effort to get updates
- Support for data subscribers is provided directly by the domain experts who curated the data sets
- Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution.
**Need Help?**
If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).
提供机构:
John Snow Labs
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含两个子集:一是9991个microRNA序列及家族的详细注释(包括种子序列、物种ID等),二是28,353个人类基因及其miRNA注释信息(含基因符号、描述等)。数据经过严格质量审核,适用于基因研究,商业使用需订阅授权。
以上内容由遇见数据集搜集并总结生成



