five

Regulatory genome annotation for 33 insect species

收藏
DataONE2024-12-02 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:163c19664d022598eb6b193a2134917219bbb0e35546f3d419dc3c2744422a3d
下载链接
链接失效反馈
官方服务:
资源简介:
Annotation of newly-sequenced genomes frequently includes genes, but rarely covers important non-coding genomic features such as the cis -regulatory modules—e.g., enhancers and silencers—that regulate gene expression. Here, we begin to remedy this situation by developing a workflow for rapid initial annotation of insect regulatory sequences, and provide a searchable database resource with enhancer predictions for 33 genomes. Using our previously-developed SCRMshaw computational enhancer prediction method, we predict over 2.8 million regulatory sequences along with the tissues where they are expected to be active, in a set of insect species ranging over 360 million years of evolution. Extensive analysis and validation of the data provides several lines of evidence suggesting that we achieve a high true-positive rate for enhancer prediction. One, we show that our predictions target specific loci, rather than random genomic locations. Two, we predict enhancers in orthologous loci across a ..., Computational analysis of genome sequence and annotation., , # Regulatory genome annotation of 33 insect species [https://doi.org/10.5061/dryad.3j9kd51t0](https://doi.org/10.5061/dryad.3j9kd51t0) These data are the results of regulatory sequence prediction on 33 insect genomes, produced using the SCRMshaw pipeline as described in the associated publication. Five sets of files are provided: (1) The post-processed SCRMshaw output for each genome. These files all begin with \"scrmshawOutput\" or \"SO_scrmshawOutput\"; files beginning with \"SO\" have undergone the orthology-assignment step. This output has not been further processed to merge overlapping predictions or to merge duplicate predictions generated using different training data. These files have the extension \".bed\" and are tab-delimited text files that can be opened using any standard text editor. (2) The prediction data from each genome, with overlapping and/or duplicate predictions reconciled as described in the protocol and converted to GFF format. These files begin with a species design...
创建时间:
2024-12-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作