Anas platyrhynchos (Pekin duck) Genome sequencing and assembly
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP265555
下载链接
链接失效反馈官方服务:
资源简介:
Pekin duck (Z2 strain) has a haploid genome size estimated to be 1.41 Gb(Nakamura et al. 1990; Tiersch and Wachtel 1991), and a karyotype of 9 pairs of macrochromosomes (from chr1 to chr8, chrZ/chrW) and 31 pairs of microchromosomes (chr9 to chr39)(Takagi and Makino 1966). To de novo assemble the new genome, we generated 143-fold of PacBio long reads (read N50 14.3 kb) and 142-fold of 10x linked-read data from a female individual, 56-fold of BioNano map and 82-fold of Hi-C reads from a male individual. To identify the female-specific chrW sequences, we also generated 72-fold Illumina reads from a male individual and 92-fold Illumina reads from a female individual. Our primary assembly of PacBio long reads have assembled the entire genome into 1645 gapless contigs. To extend the contigs, we first corrected their sequence errors with 92-fold female Illumina reads, then oriented and connected them into 942 scaffolds with linked-reads, BioNano optical maps and Hi-C reads. As Hi-C data provides linkage but not orientation information, at our final step of chromosome anchoring, we incorporated RH linkage map and reduced the scaffold number further down to 789. This new genome has assembled 95.6% (1.13 Gb) sequences into 31 autosomes and a pair of sex chromosomes, leaving only 4.4% (62.1 Mb) of the genome unanchored due to their repetitive sequence composition or lack of linkage markers.
北京鸭(Z2品系)的单倍体基因组大小(haploid genome size)预估为1.41 Gb(Nakamura等,1990;Tiersch与Wachtel,1991),核型(karyotype)包含9对大染色体(macrochromosomes,即chr1至chr8、chrZ/chrW)以及31对微染色体(microchromosomes,即chr9至chr39)(Takagi与Makino,1966)。为从头组装(de novo assemble)该全新基因组,本研究团队从1只雌性个体中获取了143倍覆盖度的PacBio长读长数据(PacBio long reads,读长N50为14.3 kb)与142倍覆盖度的10x连锁读长数据(linked-read data),并从1只雄性个体中获取了56倍覆盖度的BioNano光学图谱(BioNano map)与82倍覆盖度的Hi-C读长数据(Hi-C reads)。为鉴定雌性特异性的chrW序列,本研究团队还分别从1只雄性个体与1只雌性个体中获取了72倍覆盖度与92倍覆盖度的Illumina读长数据(Illumina reads)。基于PacBio长读长数据的初步组装已将全基因组组装为1645个无间隙重叠群(gapless contigs)。为延长重叠群序列,本研究团队首先使用92倍覆盖度的雌性Illumina读长数据校正序列错误,随后借助连锁读长数据、BioNano光学图谱与Hi-C读长数据将重叠群定向并连接为942个支架(scaffolds)。由于Hi-C数据仅能提供连锁关系而非定向信息,在最终的染色体锚定步骤中,本研究团队整合了辐射杂交连锁图谱(RH linkage map),进一步将支架数量缩减至789个。该全新基因组已将95.6%(1.13 Gb)的序列组装至31条常染色体与1对性染色体中,仅剩余4.4%(62.1 Mb)的基因组序列因重复序列组成或缺乏连锁标记而未完成锚定。
创建时间:
2020-11-25



