five

Dataset: Combined Annotation Dependent Depletion (CADD) scores for turkey and chicken

收藏
4TU.ResearchData2025-11-06 更新2026-04-23 收录
下载链接:
https://data.4tu.nl/datasets/f2ff2a38-0766-48f0-99f1-65d875ba81d4/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains genome-wide CADD (Combined Annotation Dependent Depletion) scores for chicken and turkey, generated as part of research aimed at predicting the deleteriousness of genetic variants in non-model species. The objective of the study was to develop and apply a generic, species-agnostic pipeline that computes CADD scores using only a high-quality reference genome, corresponding gene annotation, and a multi-species alignment (MSA) to infer ancestral sequences. The research involved computational methods rather than experimental sample collection; genomic reference assemblies, available functional annotations, and an evolutionary MSA were used as input features to train a machine learning model that assigns PHRED-like CADD scores to all possible single nucleotide variants across the genome. The resulting data consist of chromosome-wise tab-delimited files containing CADD scores for chicken (<code>chr{chr}.tsv.gz</code>) and turkey (<code>Turkey_chr{chr}.tsv.gz</code>), which can be used for comparative genomics, evolutionary analyses, and prioritization of candidate variants in genomic and breeding studies. The work is described in the publication <em>“A generic pipeline for CADD Score generation: chickenCADD and turkeyCADD”</em>, accepted in <em>G3</em>.

本数据集包含鸡与火鸡全基因组范围的联合注释依赖缺失(Combined Annotation Dependent Depletion,CADD)评分,其生成自一项旨在预测非模式物种遗传变异有害性的研究。本研究的核心目标是开发并应用一套通用、物种无关的分析流程,该流程仅依托高质量参考基因组、对应基因注释信息以及用于推断祖先序列的多物种序列比对(multi-species alignment,MSA)来计算CADD评分。本研究采用计算方法而非实验样本采集;研究人员以基因组参考组装结果、现有功能注释信息以及进化型多物种序列比对作为输入特征,训练机器学习模型,为全基因组范围内所有可能的单核苷酸变异赋予类PHRED的CADD评分。最终生成的数据为按染色体拆分的制表符分隔文件,其中包含鸡(命名格式为<code>chr{chr}.tsv.gz</code>)与火鸡(命名格式为<code>Turkey_chr{chr}.tsv.gz</code>)的CADD评分,可用于比较基因组学、进化分析以及基因组与育种研究中的候选变异优先级排序。该项研究成果已被《G3》期刊录用,相关论文题为<em>"A generic pipeline for CADD Score generation: chickenCADD and turkeyCADD"</em>。
提供机构:
de Ridder, Dick
创建时间:
2025-11-06
二维码
社区交流群
二维码
科研交流群
商业服务