False and true positives in arthropod thermal adaptation candidate gene lists
收藏DataCite Commons2025-06-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.m0cfxpp3r
下载链接
链接失效反馈官方服务:
资源简介:
Genome-wide studies are prone to false positives due to inherently low
priors and statistical power. One approach to ameliorate this problem is
to seek validation of reported candidate genes across independent studies:
genes with repeatedly discovered effects are less likely to be
false positives. Inversely, genes reported only as many times as expected
by chance alone, while possibly representing novel discoveries, are also
more likely to be false positives. We show that, across over 30
genome-wide studies that reported Drosophila and Daphnia genes with
possible roles in thermal adaptation, the combined lists of candidate
genes and orthologous groups are rapidly approaching the total number of
genes and orthologous groups in the genome, respectively, consistent with
the expectation of high frequency of false positives. The majority of
these spurious candidates have been identified by one or a few studies, as
expected by chance alone. In contrast, a noticeable minority of genes have
been identified by numerous studies with the probabilities of such
discoveries occurring by chance alone being exceedingly small. For this
subset of genes, different studies are in agreement with each other
despite differences in the ecological settings, genomic tools and
methodology, and reporting thresholds. We provide a reference
set of presumed true positives among Drosophila candidate genes and
orthologous groups involved in response to changes in temperature,
suitable for cross-validation purposes. Despite this approach being prone
to false negatives, this list of presumed true positives includes several
hundred genes, consistent with the "omnigenic" concept of
genetic architecture of complex traits.
全基因组研究(genome-wide study)由于固有低先验概率与统计功效不足,极易产生假阳性结果。缓解该问题的可行策略之一,是在独立研究中对已报道的候选基因开展验证:效应被反复检出的基因,其为假阳性的概率显著更低。反之,仅出现随机预期次数的报道基因,即便可能属于全新发现,其为假阳性的概率同样较高。我们对30余项报道过果蝇(Drosophila)与水蚤(Daphnia)潜在温度适应相关基因的全基因组研究进行荟萃分析后发现,候选基因与直系同源簇(orthologous group)的合并列表,正分别趋近于基因组中基因与直系同源簇的总数量,这与假阳性频发的预期高度一致。这类虚假候选基因中的绝大多数仅被1项或少数几项研究鉴定,完全符合随机事件的预期。与之形成鲜明对比的是,仅有少量基因被大量研究共同鉴定,且这类发现仅由随机因素导致的概率极低。针对这类基因子集,尽管各项研究在生态环境、基因组研究工具、实验方法与报道阈值上均存在差异,但彼此的研究结论仍保持高度一致。我们构建了一套参考数据集,涵盖果蝇温度响应相关候选基因与直系同源簇中推定的真阳性结果,可用于交叉验证(cross-validation)实验。尽管该方法易出现假阴性结果,但这套推定真阳性基因列表包含数百个基因,这与复杂性状遗传结构的“泛基因(omnigenic)”理论相符。
提供机构:
Dryad
创建时间:
2022-03-22



