five

False and true positives in arthropod thermal adaptation candidate gene lists

收藏
DataCite Commons2025-06-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.m0cfxpp3r
下载链接
链接失效反馈
官方服务:
资源简介:
Genome-wide studies are prone to false positives due to inherently low priors and statistical power. One approach to ameliorate this problem is to seek validation of reported candidate genes across independent studies: genes with  repeatedly discovered effects are less likely to be false positives. Inversely, genes reported only as many times as expected by chance alone, while possibly representing novel discoveries, are also more likely to be false positives. We show that, across over 30 genome-wide studies that reported Drosophila and Daphnia genes with possible roles in thermal adaptation, the combined lists of candidate genes and orthologous groups are rapidly approaching the total number of genes and orthologous groups in the genome, respectively, consistent with the expectation of high frequency of false positives. The majority of these spurious candidates have been identified by one or a few studies, as expected by chance alone. In contrast, a noticeable minority of genes have been identified by numerous studies with the probabilities of such discoveries occurring by chance alone being exceedingly small. For this subset of genes, different studies are in agreement with each other despite differences in the ecological settings, genomic tools and methodology, and reporting thresholds.  We provide a reference set of presumed true positives among Drosophila candidate genes and orthologous groups involved in response to changes in temperature, suitable for cross-validation purposes. Despite this approach being prone to false negatives, this list of presumed true positives includes several hundred genes, consistent with the "omnigenic" concept of genetic architecture of complex traits.

全基因组研究(genome-wide study)由于固有低先验概率与统计功效不足,极易产生假阳性结果。缓解该问题的可行策略之一,是在独立研究中对已报道的候选基因开展验证:效应被反复检出的基因,其为假阳性的概率显著更低。反之,仅出现随机预期次数的报道基因,即便可能属于全新发现,其为假阳性的概率同样较高。我们对30余项报道过果蝇(Drosophila)与水蚤(Daphnia)潜在温度适应相关基因的全基因组研究进行荟萃分析后发现,候选基因与直系同源簇(orthologous group)的合并列表,正分别趋近于基因组中基因与直系同源簇的总数量,这与假阳性频发的预期高度一致。这类虚假候选基因中的绝大多数仅被1项或少数几项研究鉴定,完全符合随机事件的预期。与之形成鲜明对比的是,仅有少量基因被大量研究共同鉴定,且这类发现仅由随机因素导致的概率极低。针对这类基因子集,尽管各项研究在生态环境、基因组研究工具、实验方法与报道阈值上均存在差异,但彼此的研究结论仍保持高度一致。我们构建了一套参考数据集,涵盖果蝇温度响应相关候选基因与直系同源簇中推定的真阳性结果,可用于交叉验证(cross-validation)实验。尽管该方法易出现假阴性结果,但这套推定真阳性基因列表包含数百个基因,这与复杂性状遗传结构的“泛基因(omnigenic)”理论相符。
提供机构:
Dryad
创建时间:
2022-03-22
二维码
社区交流群
二维码
科研交流群
商业服务