Generative prediction of causal gene sets responsible for complex traits
收藏DataONE2025-04-15 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:ec824e45dc96f9201236ef312ee753901ace2854ebbb2961e4da8419e05c11ed
下载链接
链接失效反馈官方服务:
资源简介:
The relationship between genotype and phenotype remains an outstanding question for organism-level traits because these traits are generally complex. The challenge arises from complex traits being determined by a combination of multiple genes (or loci), which leads to an explosion of possible genotype-phenotype mappings. The primary techniques to resolve these mappings are genome/transcriptome-wide association studies, which are limited by their lack of causal inference and statistical power. Here, we develop an approach that leverages transcriptional data endowed with causal information and a generative machine learning model to strengthen statistical power. Our implementation of the approach-- dubbed TWAVE---includes a variational autoencoder trained on human transcriptional data, which is incorporated into an optimization framework. TWAVE generates trait expression profiles, which we dimensionally reduce by identifying independently varying generalized pathways (eigengene..., Data were collected from GEO., , # Generative prediction of causal gene sets responsible for complex traits
[https://doi.org/10.5061/dryad.s4mw6m9hf](https://doi.org/10.5061/dryad.s4mw6m9hf)
## Description of the data and file structure
This is the data repository for the project 'Generative prediction of causal gene sets responsible for complex traits'.
This repository contains data to run Jupyter notebooks and a Python script in the associated Zenodo code repository (doi: 10.5281/zenodo.12955283).
Data: 1) single-cell RNAseq data on the human complex disease traits featured in the
manuscript (labeled by GEO series, see Table 1 in main text). In the files below, the traits are labeled by their abbreviations and GEO series:
* tep = Non-small cell lung cancer (GSE89843)
* t1d = Type-1 diabetes (GSE182870)
* MODY3 = Maturity-onset diabetes of the young type 3 (GSE129653)
* ib = Inflammatory bowel (GSE193677)
* cancermeta = Cancer metastasis (GSE202695)
* asthma = Allergic asthma (GSE96783)
* allergy = Food allergy ...,
基因型与表型之间的关联仍是生物体水平性状研究中的核心未解难题,这类性状普遍具有复杂性。复杂性状由多基因(或基因座)共同调控,这导致潜在的基因型-表型映射关系呈爆炸式增长。当前解析此类映射的主流技术为全基因组/全转录组关联研究,但这类方法因缺乏因果推断能力且统计效力不足而存在局限。本研究开发了一种方法,借助携带因果信息的转录组数据与生成式机器学习模型,以提升统计效力。我们将该方法的实现版本命名为TWAVE,其核心是在人类转录组数据上训练得到的变分自编码器(Variational Autoencoder),并将其整合至优化框架中。TWAVE可生成性状表达谱,我们通过识别独立变异的广义通路(eigengene...)对其进行降维处理。数据采集自基因表达综合数据库(Gene Expression Omnibus,简称GEO)。# 复杂性状相关因果基因集的生成式预测
[https://doi.org/10.5061/dryad.s4mw6m9hf](https://doi.org/10.5061/dryad.s4mw6m9hf)
## 数据与文件结构说明
本数据集仓库对应“复杂性状相关因果基因集的生成式预测”研究项目。本仓库包含配套Zenodo代码仓库(DOI: 10.5281/zenodo.12955283)中Jupyter Notebook与Python脚本的运行所需数据。
数据集包括:1)论文中涉及的人类复杂疾病性状的单细胞RNA测序(single-cell RNAseq)数据,数据以GEO系列编号标注(详见正文表1)。下述文件中,性状以缩写及对应GEO系列编号标注:
* tep = 非小细胞肺癌(GSE89843)
* t1d = 1型糖尿病(GSE182870)
* MODY3 = 青少年发病的成年型糖尿病3型(GSE129653)
* ib = 炎症性肠病(GSE193677)
* cancermeta = 癌症转移(GSE202695)
* asthma = 过敏性哮喘(GSE96783)
* allergy = 食物过敏……
创建时间:
2025-04-16



