Deep-learning-based annotation of 230 superasterid genomes reveals a harmonized dataset of 91,366 NLRs
收藏DataCite Commons2026-01-28 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.sxksn03d6
下载链接
链接失效反馈官方服务:
资源简介:
Plant nucleotide-binding leucine-rich repeat receptors (NLRs) are
intracellular immune receptors crucial for pathogen recognition and immune
responses. Despite their importance, NLRs are often challenging to
annotate and frequently overlooked by standard annotation pipelines. To
address the variability in NLR annotation accuracy across pipelines, we
performed a harmonized de novo annotation of 230 high-quality superasterid
genomes using the deep learning-based software Helixer (Holst et
al. 2023), resulting in the annotation of 10,124,265 protein sequences.
Additionally, we employed NLRtracker, which leverages
InterProScan for domain identification, to detect NLR and NLR-associated
sequences (Kourelis et al. 2021, Blum et al. 2025). Using the NLR
definition from the RefPlantNLR dataset, we identified 91,366 NLRs, with
counts ranging from 12 and 19 in the parasitic plants Cuscuta campestris
and Orobanche coerulescens to 2,804 in Solanum tuberosum (potato). Beyond
NLR annotation, we provide genome annotations, including proteomes, coding
nucleotide sequences (CDS), and GFF files generated by Helixer. This
dataset offers a valuable resource for standardized comparative genomics
and evolutionary studies across superasterids.
提供机构:
Dryad
创建时间:
2025-03-07



