five

Long read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors in disease

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP421129
下载链接
链接失效反馈
官方服务:
资源简介:
Genome-wide association studies (GWASs) have revealed thousands of associations in many complex traits and diseases. Previous studies suggest that a subset of associations are due to alterations in splicing; however, interpreting the effects of splicing on protein isoforms is hindered by limitations in defining full-length transcript isoforms using short-read RNA-seq data. Long-read RNA-seq represents a powerful approach to define and quantify transcript isoforms. In this study, we developed a novel approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease relevant model to infer the effects of sQTL on the ultimate protein isoform products they encode. Such information enables identification of genes potentially responsible for GWAS associations. As a proof-of-concept, we generated deep coverage (N=~22 million full-length reads) PacBio long-read RNAseq data on human fetal osteoblasts (hFOBs), a cell-line of relevance to the regulation of bone mineral density (BMD). We identified 68,326 protein-coding isoforms, including 17,375 (25%) which were novel. Next, we used Bayesian colocalization to identify 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes which colocalized with BMD associations (H4PP > 0.75). A total of 836 junctions with colocalizing sQTLs in 459 (of the 732) genes were expressed in hFOB long-read RNA-seq data. With these data, we formulated hypotheses regarding the potential mechanism of action of each sQTL. For example, we identified 7 junctions with colocalizing sQTLs (maximum H4PP = 0.98-0.99) in TPM2 for splice junctions between two nearly mutually exclusive exons, and two different transcript termination sites, making it impossible to interpret without long-read RNA-seq data. siRNA mediated knockdown in hFOBs showed two TPM2 isoforms with opposing effects on mineralization. Our results suggest that splicing is a major mechanism underlying GWAS associations and long-read proteogenomics data is critical to precisely define the protein isoforms that are produced from splicing alterations. Overall design: Long-read proteogenomics coupled with sQTL colocalization and experimental validation
创建时间:
2025-04-08
二维码
社区交流群
二维码
科研交流群
商业服务