five

Functional annotation of 180 RefSeq reference plant proteomes reveals a dataset of 113,684 NLR proteins

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13627394
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract Nucleotide-binding leucine-rich repeat receptors (NLRs) are critical components of plant immune systems, responsible for detecting pathogens and initiating defence responses. As part of our exploration of NLR protein diversity across a broad spectrum of plant species, we created a comprehensive NLRome dataset by analyzing 180 reference plant genomes from the NCBI RefSeq database (Pruitt et al. 2007). This database includes high-quality genome annotations for species from a wide phylogenetic range, encompassing algae, gymnosperms, early flowering plants, monocots, and dicots (https://www.ncbi.nlm.nih.gov/refseq/). Using NLRtracker, a specialized bioinformatics tool that integrates InterProScan for domain identification, we extracted and catalogued NLR proteins across these diverse genomes. Based on the NLR definition of RefPlantNLR and NLRtracker (Kourelis et al. 2021), 169 of the 180 species had at least 1 NLR predicted. In total, we catalogued 113,686 NLRs, ranging from 33 in Cucurbita maxima to 4155 in Quercus robur. In addition to NLR annotation, NLRtracker provided functional annotations for the entire proteome of each species enabling comparative genomics and evolutionary studies.   NLRtracker output legend: File extension Description * _NLRtracker.tsv NLRtracker overview output with gene status. *_NLR.lst Identifier list of NLRs. *_NLR.gff3 NLR annotation of motifs, domains, and regions in GFF3 format. *_NLR.fasta NLR FASTA sequences. *_NLR-associated.lst Identifier list of NLR associated genes. *_NLR-associated.gff3 NLR associated genes annotation of motifs, domains, and regions in GFF3 format. *_NLR_associated.fasta NLR associated genes FASTA sequences. *_NBARC.fasta NB-ARC domain FASTA sequences. *_NBARC_deduplictated.fasta Deduplicated NB-ARC domain FASTA sequences. *_iTOL.txt Domain annotation file for iTOL. *_iTOL_dedup.txt Domain annotation file of the deduplicated sequences for iTOL. *_Domains.tsv Full-length and domain sequence and metadata for all NLRtracker output. interpro_result.gff InterProScan output of the query proteome.   Supplementary Data Data S1. RefSeq species list and metadata. Data S2. Per genome sequence number statistics table for proteomes, total NLR, and putative NLR types determined by NLRtracker.
创建时间:
2024-09-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作