GeneRAIN
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10408774
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains various files essential for understanding and employing the GeneRAIN models, as described in the accompanying manuscript. GeneRAIN models use bulk RNA-seq data and a 'Binning-By-Gene' normalization method. These models aim to improve upon existing methods in understanding biological information and include a vector representation of genes called GeneRAIN-vec. After thorough testing, these models have shown their effectiveness in predicting a wide range of biological characteristics, including for long non-coding RNAs. This shows their usefulness and potential in bioinformatics and computational biology. The provided dataset includes:
Gene Embedding Files: These files offer 200-dimensional and 32-dimensional vector representations of genes.
Checkpoint Files: Checkpoints of various GeneRAIN models.
JSON Mapping Files: For gene to index mapping and tokenization processes. Note that some genes with low mean expression values might not be present in the model input dataset.
ARCHS Human Bulk RNA-seq Data: Access the 'human_gene_v2.2.h5' file and corresponding metadata from the official ARCHS4 website.
Normalized ARCHS Dataset: Processed via the 'Binning-By-Gene' method, this dataset is divided into five sample-based segments, ready for model training.
Mean Expression and Flag Files: Contains mean expression values of genes and boolean flags to help filter duplicate gene symbols.
Normalization and Binning Files: Utilize these with the 'normalize_expr_mat.ipynb' notebook to determine binning boundaries in new expression data.
Example Input/Output: Provided for 'anal_dataset.ipynb' to demonstrate model application.
Prediction Results: The 'genes_clf_pred_results.parquet' file contains the predicted results of coding and lncRNA genes.
创建时间:
2024-04-12



