five

START: A system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries. Homo sapiens

收藏
NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA319281
下载链接
链接失效反馈
官方服务:
资源简介:
AbstractBackground: A genomic signal track is a set of genomic intervals associatedwith values of various types, such as measurements from high-throughputexperiments. Analysis of signal tracks requires complex computational methods,which often make the analysts focus too much on the detailed computationalsteps rather than on their biological questions.Results: Here we propose Signal Track Query Language (STQL) for easy analysisof signal tracks. It is an Structured Query Language (SQL)-like declarativelanguage, which means one only specifies what computations need to be donebut not how these computations are to be carried out. STQL provides a rich setof constructs for manipulating genomic intervals and their values. To run STQLqueries, we have developed the Signal Track Analytical Research Tool (START,http://yiplab.cse.cuhk.edu.hk/start/), a system that includes a Web-based userinterface and a back-end execution system. The user interface helps users selectdata from our database of around 10,000 commonly-used public signal tracks,manage their own tracks, and construct, store and share STQL queries. Theback-end system automatically translates STQL queries into optimized low-levelprograms and runs them on a computer cluster in parallel. We use STQL toperform 14 representative analytical tasks. By repeating these analyses usingbedtools, Galaxy and custom Python scripts, we show that the STQL solution isusually the simplest, and the parallel execution achieves significant speed-up withlarge data files. Finally, we describe how a biologist with minimal formal trainingin computer programming self-learned STQL to analyze DNA methylation datawe produced from 60 pairs of hepatocellular carcinoma (HCC) samples.
创建时间:
2016-04-21
二维码
社区交流群
二维码
科研交流群
商业服务