Supporting data for "ARA: A flexible pipeline for automated exploration of NCBI SRA datasets"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/102428
下载链接
链接失效反馈官方服务:
资源简介:
One of the most effective and useful methods to explore the content of biological databases is searching with nucleotide or protein sequences as a query. However, especially in the case of nucleic acids, due to the large volume of data generated by the Next Generation Sequencing technologies, this approach is often not available. The hierarchical organization of the NGS records is primarily designed for browsing or text-based searches of the information provided in metadata-related keywords, limiting the efficiency of database exploration.<br>We developed an automated pipeline that incorporates the well-established NGS data processing tools and procedures to allow easy and effective sampling of the NCBI SRA database records. Given a file with query nucleotide sequences, our tool estimates the matching content of SRA accessions by probing only a user-defined fraction of a records sequences. Based on the selected parameters, it allows performing a full mapping experiment with records that meet the required criteria. <br>The pipeline is designed to be easy to operate it offers a fully automatic setup procedure and is fixed on tested supporting tools. The modular design and implemented usage modes allow a user to scale up the analyses into complex computational infrastructure. We present an easy-to-operate and automated tool that expands the way a user can access and explore the information contained within the records deposited in the NCBI SRA database.
挖掘生物数据库内容的高效实用途径之一,是以核苷酸或蛋白质序列作为查询序列开展检索。然而,针对核酸序列的此类检索常受限于二代测序(Next Generation Sequencing,NGS)技术产出的海量数据,难以直接应用该方法。NGS数据记录的层级化组织架构,主要设计用于基于元数据相关关键词的浏览或文本检索,极大限制了数据库挖掘的效率。<br>我们开发了一套自动化分析流程(pipeline),整合了成熟的NGS数据处理工具与流程,可实现对NCBI SRA 数据库(National Center for Biotechnology Information Sequence Read Archive)记录的高效便捷采样。当提供包含查询核苷酸序列的文件时,本工具仅通过探测用户指定比例的单条记录序列,即可估算SRA登录号(SRA accessions)的匹配内容。基于选定的参数,本工具可对符合筛选标准的记录执行完整的序列比对实验。<br>该分析流程设计为易于操作,提供全自动的配置流程,并依托经过验证的配套工具运行。其模块化设计与支持的多种使用模式,可支持用户将分析任务扩展至复杂的计算基础设施中。我们开发的这款便捷易用的自动化工具,拓展了用户访问与探索NCBI SRA数据库中存储记录所含信息的途径。
提供机构:
GigaScience Database
创建时间:
2023-07-27



