Gene Scores - Unadjusted - Regular
收藏Figshare2022-02-15 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Gene_Scores_-_Unadjusted_-_Regular/14981151
下载链接
链接失效反馈官方服务:
资源简介:
Gene scores for selected combinations of phenotypes andSNV-to-gene mappings as calculated using genuine summary statistics with MAGMA's (v1.08) SNP-Wise Mean algorithm. Refer to MAGMA's original publication as well as to theMAGMA website (https://ctg.cncr.nl/software/magma) for details concerning the SNP-Wise mean algorithm and the significance of MAGMA's v1.08 update in this regard.NOTE: We refer to the gene scores in this directory as "unadjusted", since they are not yet adjusted (following truncation of outlying gene scores) for residual effects of SNV count, within-gene LD, inverse mean minor-allele count, and sample size. Refer to the MAGMA user manual (downloadable from the MAGMA website) for details. In our analyses, unadjusted gene scores served merely as input to MAGMA's gene-set analysis in order to obtain (i) adjusted gene scores and (ii) gene-set scores (based on enrichment testing with adjusted gene scores). Refer to our publication for details. We supply these files forothers to execute gene-set analyses with precomputed genescores for diverse combinations of phenotypes and mappings. File names are descriptive. For example:GENEBODY_U2D2_EPM-GENEHANCER_DEFAULT-SUMSTATS_G1000EUR_MAF1(1) GENEBODY - we mapped SNVs to genes with which they overlap (irrespective of whether the overlapwas with an intron or an exon).(2) U2D2 - flanks for mapping proximal SNVs to genes.U (flank upstream of the transcription start-site) and D (flank downstream of the transcription end-site) where the integers specify the corresponding flank's size in kb.(3) EPM-GENEHANCER - name of dataset of regulatoryinteractions used for augmentation of SNV-to-gene mapping, following the naming system used in our publication (NONEmeans no augmentation with RIs).(4) DEFAULT-SUMSTATS - specifies that these scoreswere obtained with genuine summary statistics andnot with any permutations (that is, not with theEPVP permutation control discussed in our publication).(5) G1000EUR - specifies that we used the 1000 Genomes Project European Reference Population files (that is,those downloadable from the MAGMA website) for mappingSNVs to genes (see next point as well).(6) MAF1 - specifically, we used SNVs with a minorallele frequency (MAF) of 1% from the relevantMAGMA binary files to build our mappings. Onecan use a software such as PLINK to filter thebinary files according to properties such as MAF(https://zzz.bwh.harvard.edu/plink/).We provide both the "genes.out" and "genes.raw" files.These are equivalent in terms of gene scores but the"genes.raw" file format has a less-friendly format andserves as input to gene-set analysis (that is, this filecontains additional information on gene-gene correlationsbased on linkage disequilibrium).
本数据集包含基于真实汇总统计量,使用MAGMA(v1.08)的SNP-wise均值算法计算得到的、针对选定表型组合与单核苷酸变异(SNV)-基因映射关系的基因得分。有关SNP-wise均值算法的细节以及MAGMA v1.08版本在此方面更新的重要性,请参阅MAGMA的原始文献及其官方网站(https://ctg.cncr.nl/software/magma)。
注意:本目录下的基因得分被称为"未校正得分",因为它们尚未针对SNV数量、基因内连锁不平衡(Linkage Disequilibrium, LD)、平均次要等位基因计数倒数以及样本量的残留效应进行校正(在对异常基因得分进行截尾处理后亦未校正)。有关详细信息,请参阅可从MAGMA官网下载的MAGMA用户手册。在本研究的分析中,未校正得分仅作为MAGMA基因集分析的输入,用于获取(i)校正后的基因得分,以及(ii)基于校正后基因得分富集检验的基因集得分。详细信息请参阅本团队的发表文献。
我们提供这些文件,以供其他研究者基于预计算得到的、针对多样表型组合与映射关系的基因得分开展基因集分析。文件名均具备描述性。例如:GENEBODY_U2D2_EPM-GENEHANCER_DEFAULT-SUMSTATS_G1000EUR_MAF1
(1) GENEBODY:将SNV映射至与其存在重叠区域的基因(无论重叠区域为内含子还是外显子)。
(2) U2D2:用于近端SNV-基因映射的侧翼区域。其中U代表转录起始位点上游侧翼,D代表转录终止位点下游侧翼,整数部分表示侧翼区域的大小(单位为千碱基对,kb)。
(3) EPM-GENEHANCER:用于拓展SNV-基因映射关系的调控相互作用(Regulatory Interactions, RI)数据集名称,遵循本团队发表文献中的命名规则;若为NONE则代表未使用调控相互作用进行拓展。
(4) DEFAULT-SUMSTATS:表明该得分基于真实汇总统计量计算得到,未经过任何置换处理(即未使用本团队发表文献中提及的EPVP置换对照方法)。
(5) G1000EUR:表明SNV-基因映射过程中使用了千人基因组计划欧洲参考群体数据文件(即可从MAGMA官网下载的相关文件),详见下一条说明。
(6) MAF1:表明映射构建过程中使用了对应MAGMA二进制文件中次要等位基因频率(Minor Allele Frequency, MAF)为1%的SNV。研究者可使用PLINK等软件基于次要等位基因频率等属性对二进制文件进行过滤(https://zzz.bwh.harvard.edu/plink/)。
本数据集同时提供"genes.out"与"genes.raw"两种格式的文件。二者的基因得分信息完全一致,但"genes.raw"文件格式可读性较差,仅作为基因集分析的输入文件(该文件包含基于连锁不平衡计算得到的基因间相关性额外信息)。
创建时间:
2022-02-15



