Population-scale skeletal muscle single-nucleus multi-omic profiling reveals extensive context specific genetic regulation

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/10084013

下载链接

链接失效反馈

官方服务：

资源简介：

Data accompanying the manuscript "Single-nucleus chromatin and gene expression profiling across hundreds of skeletal muscle samples reveals context-specific regulation".Filename: Description1. cluster-info-qc.tsv: nucleus-sample-cluster map with other QC info. # index: nucleus identified syntax ..NM.<10X channel>. # UMAP_1, UMAP_2: UMAP coordinates for visualization# modality: rna or atac# batch: processing batch identifier# hqaa_umi: high quality autosomal alignments (HQAA) for atac nuclei, unique molecular identifier (UMI) for tna # fraction_mitochondrial: fraction of reads mapping to the mitochondrial genome# cohort: sample cohort# tss_enrichment: TSS enrichment for atac nuclei# coarse_cluster_name: cluster name# donor: sample donor identifier 2. # consensus-summits.bed: consensus summits along with the cell type that the summits was highest in.# narrow peaks in clusters# consensus summit feature (summit +- 150bp) identified in each cluster - these were used in GWAS enrichments. 3. snrna-cell-type-specific-genes.tsv: Normalized expression scores for genes in each cell-type cluster 4. eqtl_significant.tar.gz: # significant eQTLs in each cell-type cluster. Columns: # gene_name: gene name# start_pheno: phenotype start (gene TSS)# end_pheno: phenotype end (gene TSS)# strand: gene strand# n_variants_tested: number of variants tested for the gene# distance_var_pheno: distance of the variant with the gene TSS# snp: snp ID# snp_chrom: snp chromosomze# snp_start: snp start pos# snp_end: snp end pos# n_effective_tests: number of effective tests# p_nominal: nominal p value# slope: slope/beta of the linear regression. Keyed on the alt allele# se: standard error of the slope# p_beta: beta distribution adjusted p value# qvalue: qvalue (Storey) 5. caqtl_significant.tar.gz: # significant caQTLs in each cell-type cluster. Columns:# gene_name: peak feature coordinates# start_pheno: phenotype start (ATAC summit, i.e. peak feature midpoint)# end_pheno: phenotype end (ATAC summit, i.e. peak feature midpoint)# strand: strand ("." for ATAC)# n_variants_tested: number of variants tested for the peak feature# distance_var_pheno: distance between the variant with the peak feature# snp: snp ID# snp_chrom: snp chromosome# snp_start: snp start pos# snp_end: snp end pos# n_effective_tests: number of effective tests# p_nominal: nominal p value# slope: slope/beta of the linear regression. Keyed on the alt allele# se: standard error of the slope# p_beta: beta distribution adjusted p value# qvalue: qvalue (Storey) 6. eqtl_full_scan.tar.gz # Full eQTL cis scan, sorted by variant chrom:pos and tabix indexed. Columns: # snp_chrom: snp chromosome# snp_start: snp start# snp_end: snp end# snp: snp id# gene_name: gene name# chrom: chromosome# start_pheno: phenotype start (gene TSS)# end_pheno: phenotype end (gene TSS)# strand: gene strand# n_variants_tested: number of variant tested # distance_var_pheno: distance between the variant with the gene TSS# p_nominal: nominal p value# r2: the r squared of the linear regression # slope: slope/beta of the linear regression. Keyed on the alt allele# se: standard error of the slope# best_hit: whether this variant was the best hit for this phenotype. 7. caqtl_full_scan.tar.gz: # Full caQTL cis scan, sorted by variant chrom:pos and tabix indexed. Columns: # snp_chrom: snp chromosome# snp_start: snp start# snp_end: snp end# snp: snp id# gene_name: gene name# chrom: chromosome# start_pheno: phenotype start (ATAC summit, i.e. peak feature midpoint)# end_pheno: phenotype end (ATAC summit, i.e. peak feature midpoint)# strand: gene strand# n_variants_tested: number of variants tested # distance_var_pheno: distance between the variant with the peak feature# p_nominal: nominal p value# r2: the r squared of the linear regression # slope: slope/beta of the linear regression. Keyed on the alt allele# se: standard error of the slope# best_hit: whether this variant was the best hit for this phenotype. 8. eqtl_credible_sets.tar.gz: # eQTL credible set. The file name denotes the egene and the signal hit id. Bed file columns: # 1: snp chromosome# 2: snp start# 3: snp end# 4: snp chrom_pos_ref_alt# 5: Bayes Factor # 6: PIP# 7: SNP rsid 9. caqtl_credible_sets.tar.gz: # caqtl credible set. The file name denotes the egene and the signal hit id. Bed file columns: # 1: snp chromosome# 2: snp start# 3: snp end# 4: snp chrom_pos_ref_alt# 5: Bayes Factor # 6: PIP# 7: SNP rsid 10. cicero_all.tar.gz # Cicero coaccessibility results. Columns# Peak 1 : Macs2 narrowpeak coordinate for peak 1# Peak 2 : Macs2 narrowpeak coordinate for peak 2# coaccess: Cicero coaccessibility score 11. cicero_gene_tss.tar.gz: Cicero coaccessibility results between peak and genes. Macs2 narrow peaks in the TSS+1kb upstream region are assigned that gene name. Columns# Cicero coaccessibility results between peak and genes. Macs2 narrow peaks in the TSS+1kb upstream region are assigned that gene name.Columns# Peak 1 : Macs2 narrowpeak coordinate for peak 1# gene_name: Assigned gene# Peak 2 : Macs2 narrowpeak coordinate for peak 2# coaccess: Cicero coaccessibility score## Peak1 is the narrowpeak in the TSS region, peak2 is the distal peak 12. mash.tar.gz Mashr results for e/caQTL - lfsr, posterior means and posterior SD for each tested eSNP-eGene, caSNP-caPeak pair 13. cellregmap.tar.gz cellRegMap results for eQTL and caQTL. Columns:# p_nominal P value from cellRegMap# kind: kind of test, linear G association or GxC interaction# beta_g: in case of linear G association, beta_g or the regression coefficient# snp: lead QTL SNP tested chrom:pos(hg38): rsid# gene_name: gene name for eQTL or peak coordinates for caQTL 14. coloc-eqtl-caqtl.tsv: # Summary of eQTL-caQTL coloc in each cluster. Columns:# nsnps: Number of SNPs in the region# eqtl_hit: SNP with the highest bayes factor in the SuSiE eQTL credible set# caqtl_hit: SNP with the highest bayes factor in the SuSiE caQTL credible set# PP.H0.abf: Coloc posterior probability for no signal# PP.H1.abf: Coloc posterior probability for signal in dataset 1# PP.H2.abf: Coloc posterior probability for signal in dataset 2# PP.H3.abf: Coloc posterior probability for different signal in datasets 1 and 2# PP.H4.abf: Coloc posterior probability for shared signal in datasets 1 and 2# idx1: Index of the SuSiE credible set for dataset 1# idx2: Index of the SuSiE credible set for dataset 2# cluster: cluster name# egene: eGene name# capeak: caPeak coordinates 15. cit-mrs-summary.tsv: Summary from CIT and MR Steiger directionality tests. Columns:# cluster: cluster name# egene: eGene name# capeak: caPeak coordinates# eqhit: SNP with the highest bayes factor in the SuSiE eQTL credible set# cahit: SNP with the highest bayes factor in the SuSiE caQTL credible set# p.cit_c_c-e: P value for CIT causal cahit-ca-to-e model# q.cit_c_c-e: q value for CIT causal cahit-ca-to-e model# p.cit_rc_c-e: P value for CIT reverse-causal eqhit-ca-to-e model # q.cit_rc_c-e: value for CIT reverse-causal eqhit-ca-to-e model # p.cit_c_e-c: P value for CIT causal eqhit-e-to-ca model# q.cit_c_e-c: q value for CIT causal eqhit-e-to-ca model# p.cit_rc_e-c: P value for CIT reverse-causal cahit-e-to-ca model # q.cit_rc_e-c: q value for CIT reverse-causal cahit-e-to-ca model # cit_direction: Direction inferred from CIT # correct_causal_direction--ca-to-e: MR Steiger directionality test - is ca-to-e direction correct?# correct_causal_direction--e-to-ca: MR Steiger directionality test - is e-to-ca direction correct?# sensitivity_ratio--ca-to-e: MR Steiger Sensitivity ratio for ca-to-e model # sensitivity_ratio--e-to-ca: MR Steiger Sensitivity ratio for e-to-ca model# steiger_test--ca-to-e: MR Steiger directionality test P value for ca-to-e model# steiger_test--e-to-ca: MR Steiger directionality test P value for e-to-ca model# steiger_q--ca-to-e: MR Steiger directionality test q value for ca-to-e model# steiger_q--e-to-ca: MR Steiger directionality test q value for e-to-ca model# mrs_direction: Direction inferred from MR Steiger# direction: Direction inferred requiring consistent results between CIT and MR Steiger directionality test 16. coloc-gwas-eqtl.tsv and17. coloc-gwas-caqt.tsv # Summary of e/caQTL coloc with GWAS in each cluster. Columns:# nsnps: Number of SNPs in the region# gwas_hit: SNP with the highest bayes factor in the SuSiE GWAS credible set# eqtl_hit: SNP with the highest bayes factor in the SuSiE eQTL credible set# caqtl_hit: SNP with the highest bayes factor in the SuSiE caQTL credible set# PP.H0.abf: Coloc posterior probability for no signal# PP.H1.abf: Coloc posterior probability for signal in dataset 1# PP.H2.abf: Coloc posterior probability for signal in dataset 2# PP.H3.abf: Coloc posterior probability for different signal in datasets 1 and 2# PP.H4.abf: Coloc posterior probability for shared signal in datasets 1 and 2# idx1: Index of the SuSiE credible set for dataset 1# idx2: Index of the SuSiE credible set for dataset 2# cluster: cluster name# egene: eGene name# capeak: caPeak coordinates# p12min: Min prior p12 where the PP H4 > 0.5. Lower this value, more robust is the colocalization# trait: GWAS trait name# gwas_locus: GWAS locus name for the coloc test - a 250kb left and right flanking genomic window on this SNP was considered for testing coloc between all pairs of GWAS/QTL signals identified in this region # traitname: Expanded GWAS trait name# variable_type: GWAS type # source: Source of GWAS - either UKBB or other study 18. supplementary_tables.xlsx: Supplementary tables from the manuscript.Information included in sheets:1. n_nuclei: Number of nuclei by modality sample cluster 2. "snrna_GO_enrichment": GO term enrichment: matrix of cluster vs top 2 GO terms 3. "qtl_scan_info": e/caQTL scan infocluster: clusterntested_eqtl: N genes tested for eQTLnsig_eqtl: N significant (5% FDR) eGenesn_pheno_pcs_eqtl: N phenotype PCs considered for eQTLratio_eqtl: Ratio of N eGenes/N genes testednsig_caqtl: N peaks tested for caQTLntested_caqtl: N significant (5% FDR) caPeaksn_pheno_pcs_caqtl: N phenotype PCs considered for caQTLratio_caqtl: Ratio of N caPeaks/N peaks testednsamples_eqtl: N samples for eQTLnsamples_caqtl: N samples for caQTL 4. "gwas_trait_list": GWAS trait infotrait: GWAS trait IDtraitname: GWAS trait descriptionvariable_type: GWAS type. case/control (cc), continuous_irnt=continuous inverse-normal transformedsource: GWAS sourcedoi: GWAS study DOI 5. "traits_in_ldsc_baseline" - list of annotations included in the baseline model for LDSC 6. "gwas_enrichment_in_peaks" GWAS enrichment in cluster peaks (S-LDSC) 7. "gwas_enrichment_in_qtl_peaks" GWAS enrichment in QTL peaks (fGWAS) # fGWAS results comparing GWAS enrichment in type 1 annotationsCI_lower_ln, estimate_ln, CI_upper_ln: natural log of lower confidence interval, estimate, and upper confidence intervaltrait: trait idtraitname: trait nameannotation: annotationsig: 1 if CIs don't overlap 0, otherwise 0 8. t2d_gwas_caqtl_coloc: 9. t2d_gwas_eqtl_coloc:Summary of e/caQTL coloc with T2D GWAS in each cluster, along with target gene nominations. Columns:nsnps: Number of SNPs in the regiongwas_hit: SNP with the highest bayes factor in the SuSiE GWAS credible seteqtl_hit: SNP with the highest bayes factor in the SuSiE eQTL credible setcaqtl_hit: SNP with the highest bayes factor in the SuSiE caQTL credible setPP.H0.abf: Coloc posterior probability for no signalPP.H1.abf: Coloc posterior probability for signal in dataset 1PP.H2.abf: Coloc posterior probability for signal in dataset 2PP.H3.abf: Coloc posterior probability for different signal in datasets 1 and 2PP.H4.abf: Coloc posterior probability for shared signal in datasets 1 and 2idx1: Index of the SuSiE credible set for dataset 1idx2: Index of the SuSiE credible set for dataset 2cluster: cluster nameegene: eGene namecapeak: caPeak coordinatesp12min: Min prior p12 where the PP H4 > 0.5. Lower this value, more robust is the colocalizationtrait: GWAS trait iddiamante_gwas_locus: GWAS signal from the DIAMANTE 2018 study. Some signals that our SuSiE runs identified were not present in the original study in which case this column is NAtraitname: Expanded GWAS trait namecapeak_in_tss: caPeak in TSS + 1kb upstream region of a genegene_target_standard_cicero: caPeak coaccessible with TSS peak of a gene considering nuclei from all samples for co-accessibilitygene_target_allelic_cicero: caPeak coaccessible with TSS peak of a gene considering nuclei from samples homozygous for the caSNP allele associated with increased accessibilitygwashit_nominal_egene: gwas_hit nominally associated with these genes nominated in the columns capeak_in_tss, gene_target_standard_cicero, and gene_target_allelic_cicero 10. MPRA results for the C2CD4A locus

创建时间：

2024-09-13