树鼩群体基因组变异及遗传多样性数据集
收藏国家非人灵长类实验动物资源库2025-12-12 更新2026-02-07 收录
下载链接:
https://nhp.kiz.ac.cn/95/db
下载链接
链接失效反馈官方服务:
资源简介:
通过对83只树鼩个体的全基因组数据进行变异检测与群体遗传分析,揭示了树鼩群体的遗传多样性特征及近交过程中遗传结构的演化规律。总体来看,PCA与Admixture结果表明野生树鼩各家系间遗传分化较低,群体结构均一,说明其具备较高的遗传基础一致性,有利于后续近交系的构建。在多态性指标方面,树鼩群体的核苷酸多样性(π)约为0.0010–0.0013,纯合度(HOM)在0.81–0.89之间,显示出适中的遗传变异水平,为维持健康种群并开展系统的近交提供了稳定的遗传背景。随着近交代数的推进,树鼩基因组中连续纯合区段(ROH)总长度显著增加,从F0阶段的低水平上升至F5代超过150万Kb,表明基因组纯合度持续增强。同时,近交系数(F值)从F0的约0上升至F5超过0.4,达到大鼠成熟近交系的遗传一致性水平。这一趋势说明树鼩在经过约五代连续自交后,遗传背景已趋于稳定,杂合位点显著减少,遗传漂变作用逐渐固定,从而标志着树鼩近交系的基本形成。从选育策略上看,F2至F3阶段是近交纯化的关键时期,纯合度快速上升而群体仍具一定多样性,此阶段应加强健康筛查和繁殖控制,以防止近交衰退;至F5阶段后,个体间遗传一致性高,可作为标准化的实验用近交系开展疾病模型与基因功能研究。通过结合人类遗传数据库(如ClinVar、OMIM、1000Genomes)对树鼩变异进行注释,研究还发现了多个与人类疾病相关的功能变异位点,如经验证的BRCA1移码突变,为疾病模型系的靶向创制提供了明确的遗传依据。总体而言,本项目不仅系统评估了树鼩群体的遗传多样性与近交系数变化规律,也为树鼩近交系的创制与遗传评价建立了量化标准。结果表明,通过连续近交与基因型监测相结合的策略,可以有效获得遗传背景稳定、纯合度高、可重复利用的树鼩近交家系。这一资源为树鼩在进化生物学、遗传资源保护及人类疾病模型研究中的应用奠定了坚实基础。
数据质量:实验用成年中华树鼩由中国科学院昆明动物研究所实验动物中心饲育。所有动物实验均经中国科学院昆明动物研究所动物伦理委员会批准。动物经戊巴比妥钠麻醉后,采用磷酸盐缓冲盐水(PBS)进行心脏灌注,采集多组织进行提取,包括大脑、肌肉、肝脏、肠道、睾丸等。RNA 提取、文库构建和测序由安诺优达基因科技(中国)完成。测序采用Illumina高通量平台,平均测序深度约为30×,各样本的Q30碱基比例均超过90%,确保了数据的高准确性。原始数据经Trimmomatic 2软件严格过滤以去除接头序列、低质量reads及含N比例高的reads。过滤后数据的平均保留率超过95%。
变异检测采用GATK v4.1标准流程,结合多重过滤标准(DP<8、QD<5.0、HRun>5、SB>0.00、QUAL<50、FS>60.0、MQ<40.0、HaplotypeScore>13.0),剔除了潜在的低置信度变异。最终生成的变异集经过多轮交叉验证与一致性评估,确保变异识别的准确性和群体层面的代表性。
数据来源:a.样本收集与测序:
共采集83只树鼩个体的组织样本,提取高质量基因组DNA。使用Illumina测序平台构建短片段文库(平均插入片段长度约350 bp),并进行双端测序。
b.参考基因组:
所有reads均比对至树鼩参考基因组。
c.变异检测流程:
质量控制(QC): 使用Trimmomatic 2过滤低质量reads。
d.比对(Alignment):
使用BWA软件将清洗后的reads映射至参考基因组。
e.重复标记(Duplicate Marking):
使用Markduplicate识别并标记PCR重复。
f.变异识别(Variant Calling):
采用Genome Analysis Toolkit (GATK) v4.1鉴定SNVs和InDels。
g.变异过滤(Variant Filtering):
利用GATK VariantFiltration模块进行高置信度变异筛选,参数设置为:DP<8 || QD<5.0 || HRun>5 || SB>0.00 || QUAL<50 || FS>60.0 || MQ<40.0 || HaplotypeScore>13.0。
数据生产方式:原始fastq数据经过严格的质量控制,剔除测序接头、低质量和含N比例>10%的reads。经清洗后的数据用于后续分析。高质量reads比对至TS_3.0参考基因组后,使用SAMtools及Picard对比对结果进行排序、重复标记和索引。变异检测采用GATK HaplotypeCaller进行单样本初步检测,之后使用GenotypeGVCFs进行联合分型。对获得的原始变异结果进行硬过滤(Hard Filtering)与质量控制(QC),以去除假阳性位点。过滤后的高置信度变异使用ANNOVAR、SnpEff等软件进行功能注释,涵盖基因区、外显子区、同义/非同义突变、剪接位点变异等信息。在获得高质量变异数据的基础上,计算各样本及群体的多态性参数、遗传距离、连锁不平衡(LD)、主成分分析(PCA)及群体结构分析(STRUCTURE/ADMIXTURE)。结果为树鼩种群的遗传多样性研究提供基础数据支撑。
时间范围:2022年4月
We conducted variant detection and population genetic analyses using whole-genome sequencing data from 83 individual tree shrews, uncovering the genetic diversity profiles of the tree shrew population and the evolutionary dynamics of genetic structure during inbreeding processes. Overall, results from PCA and ADMIXTURE analyses indicate low genetic differentiation among wild tree shrew families and homogeneous population structure, suggesting high genetic uniformity, which facilitates the subsequent development of inbred lines. In terms of polymorphism metrics, the nucleotide diversity (π) of the tree shrew population ranges from 0.0010 to 0.0013, and the homozygosity (HOM) falls between 0.81 and 0.89, indicating a moderate level of genetic variation, providing a stable genetic background for maintaining healthy populations and conducting systematic inbreeding studies. As the inbreeding generations progress, the total length of runs of homozygosity (ROH) in the tree shrew genome increases significantly, rising from a low level at the F0 generation to over 1.5 million Kb in the F5 generation, indicating continuous enhancement of genomic homozygosity. Meanwhile, the inbreeding coefficient (F-value) increases from approximately 0 in F0 to over 0.4 in F5, reaching the genetic uniformity level of mature rat inbred lines. This trend demonstrates that after approximately five consecutive generations of selfing, the genetic background of tree shrews tends to stabilize, with a significant reduction in heterozygous loci and gradual fixation via genetic drift, marking the preliminary establishment of tree shrew inbred lines. From the perspective of breeding strategies, the F2 to F3 stages are critical periods for inbreeding purification, during which homozygosity rises rapidly while the population still retains a certain degree of diversity. During this stage, health screening and reproductive control should be strengthened to prevent inbreeding depression. After the F5 generation, individuals exhibit high genetic uniformity, and can be used as standardized experimental inbred lines for disease model and gene function research. By annotating tree shrew variants combined with human genetic databases such as "ClinVar", "OMIM", and "1000Genomes", this study also identified multiple functional variant sites associated with human diseases, such as the validated BRCA1 frameshift mutation, providing clear genetic evidence for targeted development of disease model lines. Overall, this project not only systematically evaluated the genetic diversity and inbreeding coefficient variation patterns of tree shrew populations, but also established quantitative standards for the development and genetic evaluation of tree shrew inbred lines. The results show that combining consecutive inbreeding and genotype monitoring strategies can effectively obtain tree shrew inbred families with stable genetic background, high homozygosity, and reusability. This resource lays a solid foundation for the application of tree shrews in evolutionary biology, genetic resource conservation, and human disease model research.
Data Quality: Experimental adult Chinese tree shrews were bred by the Laboratory Animal Center of the Kunming Institute of Zoology, Chinese Academy of Sciences. All animal experiments were approved by the Animal Ethics Committee of the Kunming Institute of Zoology, Chinese Academy of Sciences. After anesthesia with sodium pentobarbital, the animals were subjected to cardiac perfusion with phosphate-buffered saline (PBS), and multiple tissues were collected for extraction, including brain, muscle, liver, intestine, testis, etc. RNA extraction, library construction, and sequencing were completed by Annoroad Gene Technology (China). Sequencing was performed on the Illumina high-throughput platform, with an average sequencing depth of approximately 30×, and the Q30 base ratio of each sample exceeded 90%, ensuring high data accuracy. Raw data were strictly filtered using Trimmomatic 2 software to remove adapter sequences, low-quality reads, and reads with a high proportion of N bases. The average retention rate of filtered data exceeded 95%.
Variant detection was performed using the standard GATK v4.1 workflow, combined with multiple filtering criteria (DP<8, QD<5.0, HRun>5, SB>0.00, QUAL<50, FS>60.0, MQ<40.0, HaplotypeScore>13.0) to eliminate potential low-confidence variants. The final variant set underwent multiple rounds of cross-validation and consistency evaluation to ensure the accuracy of variant identification and representativeness at the population level.
Data Sources:
a. Sample Collection and Sequencing:
Tissue samples from 83 individual tree shrews were collected, and high-quality genomic DNA was extracted. A short-fragment library (average insert size of approximately 350 bp) was constructed using the Illumina sequencing platform, and paired-end sequencing was performed.
b. Reference Genome:
All reads were aligned to the tree shrew reference genome.
c. Quality Control (QC):
Low-quality reads were filtered using Trimmomatic 2.
d. Alignment:
Cleaned reads were mapped to the reference genome using BWA software.
e. Duplicate Marking:
PCR duplicates were identified and marked using MarkDuplicate.
f. Variant Calling:
Genome Analysis Toolkit (GATK) v4.1 was used to identify SNVs and InDels.
g. Variant Filtering:
High-confidence variant screening was performed using the GATK VariantFiltration module, with parameter settings: DP<8 || QD<5.0 || HRun>5 || SB>0.00 || QUAL<50 || FS>60.0 || MQ<40.0 || HaplotypeScore>13.0.
Data Production Methods: Raw fastq data underwent strict quality control to remove sequencing adapters, low-quality reads, and reads with a proportion of N bases >10%. Cleaned data were used for subsequent analyses. After aligning high-quality reads to the TS_3.0 reference genome, SAMtools and Picard were used to sort, mark duplicates, and index the alignment results. Variant detection was performed using GATK HaplotypeCaller for single-sample preliminary calling, followed by joint genotyping using GenotypeGVCFs. Hard filtering and quality control (QC) were performed on the obtained raw variant results to remove false positive loci. Functional annotation of the filtered high-confidence variants was performed using software such as ANNOVAR and SnpEff, covering information such as gene regions, exon regions, synonymous/non-synonymous mutations, and splice site variants. Based on the obtained high-quality variant data, polymorphism parameters, genetic distance, linkage disequilibrium (LD), principal component analysis (PCA), and population structure analysis (STRUCTURE/ADMIXTURE) for each sample and population were calculated. The results provide basic data support for genetic diversity research on tree shrew populations.
Time Range: April 2022
提供机构:
中国科学院昆明动物研究所
创建时间:
2025-12-12



