five

ClinvArbitration data release - May 2026

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15671820
下载链接
链接失效反馈
官方服务:
资源简介:
This file is a tarball representing the ClinvArbitration re-summary of ClinVar's raw submissions. The ClinvArbitration project represents an altered aggregation of the individual submissions, preferring to break ties when presented with submissions which don't all agree, instead of defaulting to a rating of "conflicting interpretations of pathogenicity". This leads to more variants being presented as either B/LB, or P/LP, and a reduced grey area between. This data release contains a four items: The results of the re-interpretation of ClinVar, presented as a Hail Table, and as a TSV All Pathogenic Missense variants in the ClinVar re-interpretation, indexed by Ensembl Transcript and Codon number This second part is not easily applied by existing tools, approximating the PM5 consequence category according to the ACMG criteria, and represents the following: For each Pathogenic SNV in ClinVar, we annotate the variants using BCFtools CSQ. For each Pathogenic SNV which is also a Missense variant, we reogrganise the data to be indexed on Transcript and Codon number. This can then be inverted to annotate genetic variation - if a variant is a Missense, and a ClinVar pathogenic Missense variant exists affecting the same Codon, we annotate the Missense with co-located known pathogenic ClinVar entries, in case this contributes to the interpretation of the variant under investigation. I would like to acknowledge that since this side project started, a substantial curation effort has been made in ClinVar, so the gap between the standard and re-interpreted ClinVar results has closed substantially. The exact data format presented here is required by Talos, a whole-Exome/Genome variant prioritisation tool, so despite the increasing consistency between the two results sets this exact data format should continue to be distributed. This release builds on the previous release by including Mitochondrial variants. These Mito submissions are not part of the PM5/missense matched dataset, but are present as individual decisions.

本文件为tar归档包(tarball),包含ClinVar原始提交数据的ClinvArbitration重汇总结果。ClinvArbitration项目对各类独立提交内容进行聚合处理时,针对存在解读分歧的提交会优先开展统一裁决,而非默认标注为‘致病性解读冲突’。该处理逻辑使得更多变异被归类为良性/可能良性(B/LB)或致病性/可能致病性(P/LP),大幅压缩了二者之间的灰色地带。 本次数据发布包含四项内容: ClinVar重解读结果,分别以Hail表(Hail Table)及TSV格式文件提供; ClinVar重解读结果中所有致病性错义变异,以Ensembl转录本(Ensembl Transcript)及密码子编号作为索引维度。 第二部分内容暂无法通过现有工具直接适配,其基于美国医学遗传学与基因组学学会(ACMG)评级标准对PM5效应类别进行近似估算,具体说明如下: 针对ClinVar收录的所有致病性单核苷酸变异(SNV, Single Nucleotide Variant),我们使用BCFtools CSQ工具完成变异注释流程。对于同时属于错义变异的致病性SNV,我们对原始数据进行重构,以转录本及密码子编号作为索引。该重构数据集可用于反向注释遗传变异:若某待分析变异属于错义变异,且存在ClinVar中已标注的致病性错义变异位于同一密码子位置,则可将该已知致病性ClinVar条目作为关联信息注释至当前变异,辅助其临床解读。 在此特别说明,自本辅助研究项目启动以来,ClinVar已完成大量标准化注释工作,因此官方标准ClinVar数据与本项目重解读结果之间的差异已大幅缩小。本次发布的数据格式恰好满足全外显子组/全基因组变异优先级排序工具Talos的需求,尽管两类数据集的一致性正逐步提升,但该专属数据格式仍需持续分发以适配Talos工具。 本次发布基于此前版本新增了线粒体变异数据。此类线粒体提交内容未纳入PM5/错义变异匹配数据集,但以独立裁决结果的形式包含在发布包中。
提供机构:
Zenodo
创建时间:
2025-06-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作