Comprehensive annotations of genes, transcripts, and proteins of three pea aphid genome assemblies
收藏DataCite Commons2026-01-29 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.s1rn8pknd
下载链接
链接失效反馈官方服务:
资源简介:
Accurate genome assembly and annotation are crucial for analyses of
duplication and gene family evolution. Short-read genome assemblies can
mis-assemble newly duplicated genes, and gene prediction programs can
break-up, merge, or miss genes, obscuring accurate gene content. Here, we
leverage transcriptomic data from various life stages, morphs, and sexes
of the pea aphid Acyrthosiphon pisum to produce more
comprehensive gene annotations for two long-read genome assemblies, as
well as a modified version of the reference assembly, corrected at a
critical morph-determination locus called api. We integrated three
RNA-seq-based transcript assembly methods (Trinity de novo, Trinity
genome-guided, and Stringtie) and the ab initio method AUGUSTUS
to produce gene models for all three assemblies using PASA. Proteins
produced by these gene models were clustered with the pea aphid RefSeq
proteins, as well as those from twenty other Eukaryotic species, using
OrthoFinder. This dataset contains files for all PASA gene models (GTF
format), transcripts, proteins, and the assemblies themselves (FASTA
format). Additionally, the Orthogroup clustering information for all
proteins from all methods for all assemblies is provided (TSV format).
When these genome annotations are viewed in IGV, clicking on each
transcript provides information on the closest orthologs from each species
for each protein predicted to be coded by that transcript. The transcript
and protein files can be use to search for pea aphid orthologs of proteins
of interest. These data properly assemble previously mis-assembled genes
and reveal a larger than expected amount of gene duplication, providing a
valuable resource for studying gene family evolution in pea aphids.
提供机构:
Dryad
创建时间:
2026-01-21



