Data and source code for: ClinVar and HGMD genomic variant classification accuracy has improved over time, as measured by implied disease burden
收藏DataONE2023-08-17 更新2025-08-09 收录
下载链接:
https://search.dataone.org/view/sha256:b2e1fbd9ade89dee1ab24d6f28ed8271c6891a227a07c6bb60719c341b2cf65d
下载链接
链接失效反馈官方服务:
资源简介:
Curated databases of genetic variants assist clinicians and researchers in interpreting genetic testing results. Yet these databases contain variants annotated as pathogenic that do not result in pathogenic phenotypes. Using archives of ClinVar and HGMD, we investigated how variant misclassification has changed over six years across different ancestry groups. We considered inborn errors of metabolism (IEMs) screened in newborns as a model system, because these disorders are often highly penetrant with neonatal phenotypes. We used samples from the 1000 Genomes Project (1KGP) to identify individuals with genotypes that were annotated by the databases as pathogenic. Due to the rarity of IEMs, nearly all such annotated pathogenic genotypes indicate likely variant misclassification in ClinVar or HGMD. While the false positive rates of both ClinVar and HGMD have improved over time, HGMD variants currently would imply two orders of magnitude more affected individuals in 1KGP than ClinVar varia..., , This analysis was performed with Jupyter notebooks, so all code is in ipynb files. We recommend running these files using Jupyter, which can easily be installed using conda. The notebooks should function in a python 3.8 environment. Note that the visualizations in the three Floweaver*.ipynb files will work only in a Jupyter notebook environment and not in a Jupyter lab environment. If you have any questions about running these files, please contact asharo@ucsc.edu and brenner@compbio.berkeley.edu
The following python packages are required to run these notebooks:
Pandas
cyvcf2
numpy
matplotlib
pickle
joblib
floweaver
ipysankeywidget
To reproduce the analysis in full, and to understand the logical flow, you must run the notebooks in the below order. However, if you are interested in a specific analysis, all intermediate files have also been provided, so in practice, you may run notebooks out of order. Due to restrictions on HGMD data sharing, primary and intermediate HGMD files are not ...
经过整理的遗传变异数据库可协助临床医生与研究人员解读基因检测结果。然而,这些数据库中包含一些被标注为致病性的变异,它们实际上并不会导致致病性表型。借助ClinVar与HGMD的存档数据,我们研究了六年间不同祖先群体中变异分类错误的变化情况。
我们将新生儿筛查中的先天性代谢缺陷(inborn errors of metabolism, IEMs)作为模型系统,因为这些疾病通常具有高外显率且伴随新生儿表型。我们利用1000基因组计划(1000 Genomes Project, 1KGP)的样本,识别出那些基因型被数据库标注为致病性的个体。由于IEMs的罕见性,几乎所有此类被标注的致病性基因型都暗示ClinVar或HGMD中存在可能的变异分类错误。
本分析通过Jupyter笔记本完成,因此所有代码均以ipynb文件形式存在。我们建议使用Jupyter运行这些文件,Jupyter可通过conda轻松安装。这些笔记本应能在Python 3.8环境中运行。请注意,三个Floweaver*.ipynb文件中的可视化内容仅能在Jupyter笔记本环境中运行,无法在Jupyter Lab环境中使用。若对运行这些文件有任何疑问,请联系asharo@ucsc.edu与brenner@compbio.berkeley.edu。
运行这些笔记本需要以下Python包:
Pandas
cyvcf2
numpy
matplotlib
pickle
joblib
floweaver
ipysankeywidget
若要完整复现分析并理解逻辑流程,必须按以下顺序运行笔记本。不过,若仅关注特定分析,由于所有中间文件均已提供,实际操作中可无序运行笔记本。由于HGMD数据共享的限制,原始及中间HGMD文件未...
创建时间:
2025-07-22



