Functional annotation of 180 RefSeq reference plant proteomes reveals a dataset of 113,684 NLR proteins
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13627394
下载链接
链接失效反馈官方服务:
资源简介:
Abstract
Nucleotide-binding leucine-rich repeat receptors (NLRs) are critical components of plant immune systems, responsible for detecting pathogens and initiating defence responses. As part of our exploration of NLR protein diversity across a broad spectrum of plant species, we created a comprehensive NLRome dataset by analyzing 180 reference plant genomes from the NCBI RefSeq database (Pruitt et al. 2007). This database includes high-quality genome annotations for species from a wide phylogenetic range, encompassing algae, gymnosperms, early flowering plants, monocots, and dicots (https://www.ncbi.nlm.nih.gov/refseq/). Using NLRtracker, a specialized bioinformatics tool that integrates InterProScan for domain identification, we extracted and catalogued NLR proteins across these diverse genomes. Based on the NLR definition of RefPlantNLR and NLRtracker (Kourelis et al. 2021), 169 of the 180 species had at least 1 NLR predicted. In total, we catalogued 113,686 NLRs, ranging from 33 in Cucurbita maxima to 4155 in Quercus robur. In addition to NLR annotation, NLRtracker provided functional annotations for the entire proteome of each species enabling comparative genomics and evolutionary studies.
NLRtracker output legend:
File extension
Description
* _NLRtracker.tsv
NLRtracker overview output with gene status.
*_NLR.lst
Identifier list of NLRs.
*_NLR.gff3
NLR annotation of motifs, domains, and regions in GFF3 format.
*_NLR.fasta
NLR FASTA sequences.
*_NLR-associated.lst
Identifier list of NLR associated genes.
*_NLR-associated.gff3
NLR associated genes annotation of motifs, domains, and regions in GFF3 format.
*_NLR_associated.fasta
NLR associated genes FASTA sequences.
*_NBARC.fasta
NB-ARC domain FASTA sequences.
*_NBARC_deduplictated.fasta
Deduplicated NB-ARC domain FASTA sequences.
*_iTOL.txt
Domain annotation file for iTOL.
*_iTOL_dedup.txt
Domain annotation file of the deduplicated sequences for iTOL.
*_Domains.tsv
Full-length and domain sequence and metadata for all NLRtracker output.
interpro_result.gff
InterProScan output of the query proteome.
Supplementary Data
Data S1. RefSeq species list and metadata.
Data S2. Per genome sequence number statistics table for proteomes, total NLR, and putative NLR types determined by NLRtracker.
创建时间:
2024-09-23



