SnoBIRD: A tool to identify C/D box snoRNAs and refine their annotation across all eukaryotes
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE290579
下载链接
链接失效反馈官方服务:
资源简介:
Small nucleolar RNAs (snoRNAs), a group of noncoding RNAs present amongst all eukaryotes, are known for their regulation of ribosome biogenesis and splicing. Despite their central cellular roles, current snoRNA annotations remain incomplete. Indeed, several eukaryote annotations contain few or no snoRNAs, and none distinguishes expressed snoRNAs from their pseudogenes—a recently characterized snoRNA subclass with distinct features and expression levels. To address this, we developed SnoBIRD, a BERT-based C/D box snoRNA predictor trained on snoRNAs spanning all eukaryote kingdoms. We show that SnoBIRD outperforms existing tools in a test set environment and is the only predictor capable of identifying snoRNA pseudogenes using biologically relevant signal. Applied on the fission yeast and human genomes, we demonstrate that only SnoBIRD scales well with genome size in terms of runtime, and we identify and experimentally validate several new SnoBIRD-predicted C/D box snoRNAs. By running SnoBIRD on multiple eukaryote genomes, we identify hundreds of novel C/D box snoRNA candidates and highlight SnoBIRD’s usefulness to determine the evolutionary paths of snoRNAs that share a common host locus but are distributed across different species. Overall, SnoBIRD represents a user‑friendly and efficient tool for reliably predicting C/D box snoRNAs and their pseudogenes across any eukaryote kingdom. TGIRT-Seq performed on ribodepleted samples from three wild-type (WT) S. cerevisiae replicates and three mouse brain samples, as well as TGIRT-Seq performed on size-selected RNAs (<300 nt) in two G. gallus universal reference samples, two M. mulatta universal reference samples and 3 size-selected (<150 nt) WT S. pombe replicates.
创建时间:
2025-07-30



