five

GeneRAIN Human-Mouse

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10866875
下载链接
链接失效反馈
官方服务:
资源简介:
This repository presents our study on gene expression in humans and mice, utilizing deep learning on 410K human and 366K mouse bulk RNA-seq samples to investigate gene function and disease associations across species. Our research leverages our Transformer-based models [GeneRAIN] (https://www.biorxiv.org/content/10.1101/2024.03.07.583777v1) and cross-species gene embedding alignment to analyze RNA-level similarities between human and mouse genes, offering new insights into their evolutionary and functional relationships. This project enhances our understanding of mouse genes in biomedical research, providing a novel methodology for cross-species omics analysis. This dataset comprises the following files: Supervised Aligned Gene Embeddings (Protein-Coding Genes-Only GPT Model): Human_mouse_coding_gene_embeddings_supervised_aligned.txt. This file contains embeddings for genes derived using the protein-coding genes only model. Supervised Aligned Gene Embeddings (Coding Genes, lncRNAs + Pseudogenes Model): Human_mouse_gene_embeddings_from_coding_lncRNA_pseudogene_model.txt. Embeddings in this file are derived from the model that includes coding genes, long non-coding RNAs (lncRNAs), and pseudogenes. Pairwise Similarity Matrix: similarity_matrix_chunk*.npy and similarity_matrix.npy_ids.txt (containing gene IDs). This 37,926 x 37,926 matrix averages the similarities from the supervised alignment approach and the shared embeddings + supervised alignment approach. The matrix can be loaded using script load_similarity_matrix.py. Top Ten Closest Mouse Genes for Each Human Gene: closest_10.*.txt. These files list the ten nearest mouse genes for each human gene, as determined by embedding similarity, for each gene class. Data for Reproducing Study Results: data.tar.gz. This compressed file provides the necessary data for replicating the study's results using the GeneRAIN_HM (https://github.com/suzheng/GeneRAIN_HM).
创建时间:
2024-04-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作