five

GeneRAIN

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10408774
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains various files essential for understanding and employing the GeneRAIN models, as described in the accompanying manuscript. GeneRAIN models use bulk RNA-seq data and a 'Binning-By-Gene' normalization method. These models aim to improve upon existing methods in understanding biological information and include a vector representation of genes called GeneRAIN-vec. After thorough testing, these models have shown their effectiveness in predicting a wide range of biological characteristics, including for long non-coding RNAs. This shows their usefulness and potential in bioinformatics and computational biology. The provided dataset includes: Gene Embedding Files: These files offer 200-dimensional and 32-dimensional vector representations of genes. Checkpoint Files:  Checkpoints of various GeneRAIN models. JSON Mapping Files: For gene to index mapping and tokenization processes. Note that some genes with low mean expression values might not be present in the model input dataset. ARCHS Human Bulk RNA-seq Data: Access the 'human_gene_v2.2.h5' file and corresponding metadata from the official ARCHS4 website. Normalized ARCHS Dataset: Processed via the 'Binning-By-Gene' method, this dataset is divided into five sample-based segments, ready for model training. Mean Expression and Flag Files: Contains mean expression values of genes and boolean flags to help filter duplicate gene symbols. Normalization and Binning Files: Utilize these with the 'normalize_expr_mat.ipynb' notebook to determine binning boundaries in new expression data. Example Input/Output: Provided for 'anal_dataset.ipynb' to demonstrate model application. Prediction Results: The 'genes_clf_pred_results.parquet' file contains the predicted results of coding and lncRNA genes.
创建时间:
2024-04-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作