Regulatory genome annotation for 33 insect species
收藏DataONE2024-12-02 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:163c19664d022598eb6b193a2134917219bbb0e35546f3d419dc3c2744422a3d
下载链接
链接失效反馈官方服务:
资源简介:
Annotation of newly-sequenced genomes frequently includes genes, but rarely covers important non-coding genomic features such as the cis -regulatory modulesâe.g., enhancers and silencersâthat regulate gene expression. Here, we begin to remedy this situation by developing a workflow for rapid initial annotation of insect regulatory sequences, and provide a searchable database resource with enhancer predictions for 33 genomes. Using our previously-developed SCRMshaw computational enhancer prediction method, we predict over 2.8 million regulatory sequences along with the tissues where they are expected to be active, in a set of insect species ranging over 360 million years of evolution. Extensive analysis and validation of the data provides several lines of evidence suggesting that we achieve a high true-positive rate for enhancer prediction. One, we show that our predictions target specific loci, rather than random genomic locations. Two, we predict enhancers in orthologous loci across a ..., Computational analysis of genome sequence and annotation., , # Regulatory genome annotation of 33 insect species
[https://doi.org/10.5061/dryad.3j9kd51t0](https://doi.org/10.5061/dryad.3j9kd51t0)
These data are the results of regulatory sequence prediction on 33 insect genomes, produced using the SCRMshaw pipeline as described in the associated publication. Five sets of files are provided:
(1) The post-processed SCRMshaw output for each genome. These files all begin with \"scrmshawOutput\" or \"SO_scrmshawOutput\"; files beginning with \"SO\" have undergone the orthology-assignment step. This output has not been further processed to merge overlapping predictions or to merge duplicate predictions generated using different training data. These files have the extension \".bed\" and are tab-delimited text files that can be opened using any standard text editor.
(2) The prediction data from each genome, with overlapping and/or duplicate predictions reconciled as described in the protocol and converted to GFF format. These files begin with a species design...
创建时间:
2024-12-06



