A New Comprehensive Annotation of Leucine-Rich Repeat-Containing Receptors in Rice
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5110014
下载链接
链接失效反馈官方服务:
资源简介:
Datasets for preprint (https://doi.org/10.1101/2021.01.29.428842) entitled "A new comprehensive Annotation of Leucine-Rich Repeat-Containing Receptors in Rice".
This paper describes an in-depth manual curation of LRR-CR annotations including genes containing nonsense mutations, tagged as 'non-canonical', by opposition to 'canonical', that have expected gene models.
Contains 7 files for each rice cultivar:
- domain annotation for each LRR-CR protein (xxx_LRR_domains_filtered.csv)
- gff file containing LRR-CR gene model annotations
- fasta file (1): the complete genes 'gene' (nucleotide sequence, exons and introns)
- fasta file (2): the coding sequences 'CDS' (nucleotide sequence, exons, translatable). Note that for genes experiencing frameshift, the one or two bases that cause the frameshift are avoided in order to have nucleotide sequence that can be translated. Terminal and in frame stop codons are encoded by a '*'.
- fasta file (3): protein sequences 'PEP' (amino acid sequence, exons, corresponding to translation of CDS (2) )
- fasta file (4): 'cDNA' (nucleotide sequence, exons). Note that for genes experiencing frameshift, the translated protein sequence will not correspond to the one present in the file (3). For all other genes, the files "CDS" and "cDNA" retrieve the same nucleotide sequences.
- fasta file (5): 'cDNA_wFrameshit' (nucleotide sequence, exons) : same sequences than in (4) except for genes experiencing frameshift whose sequences are completed with one or two "!" characters at the position of the frameshift in order to conserve the right reading frame (also used by V. Ranwez et al. for the MACSE programs https://dx.doi.org/10.1093/molbev/msy159 ).
创建时间:
2021-07-17



