five

long-read-tools.org: an interactive catalogue of analysis methods for long-read sequencing data

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100853
下载链接
链接失效反馈
官方服务:
资源简介:
The data produced by long-read third-generation sequencers have unique characteristics compared to short-read sequencing data, often requiring tailored analysis tools for tasks ranging from quality control to downstream processing. The rapid growth in software that address these challenges for different genomics applications are difficult to keep track of, which makes it hard for users to choose the most appropriate tool for their analysis goal, and for developers to identify areas of need and existing solutions to benchmark against.<br> We describe the implementation of long-read-tools.org, an open-source database that organises the rapidly expanding collection of long-read data analysis tools and allows its exploration through interactive browsing and filtering. The current database release contains 478 tools across 32 categories. Most tools are developed in Python and the most frequent analysis tasks include base-calling, <i>de novo</i> assembly, error-correction, quality checking/filtering, and isoform detection, while long-read single-cell data analysis and transcriptomics are areas with the fewest tools available. <br>Continued growth in the application of long-read sequencing in genomics research, positions the long-read-tools.org database as an essential resource that allows researchers to keep abreast of both established and emerging software to help guide the selection of the most relevant tool for their analysis needs.

与短读长测序数据(short-read sequencing data)相比,第三代长读长测序仪(long-read third-generation sequencers)产出的数据具有独特特性,从质量控制(quality control)到下游分析处理(downstream processing)的各类任务通常都需要定制化的分析工具。针对不同基因组学应用、可解决上述挑战的分析软件正快速增长,但相关工具难以被全面追踪,这使得用户难以挑选契合自身分析目标的最优工具,也让开发者难以明确自身研发需求及可用于性能基准测试的现有解决方案。 本研究介绍了long-read-tools.org的搭建实现:这是一个开源数据库,可对快速增长的长读长数据分析工具集进行系统化整理,并支持通过交互式浏览与筛选实现工具探索。当前版本的数据库收录了32个类别下的478款工具。其中多数工具基于Python语言开发,最常见的分析任务涵盖碱基识别(base-calling)、从头组装(de novo assembly)、错误校正、质量校验/过滤以及异构体检测(isoform detection);而长读长单细胞数据分析与转录组学(transcriptomics)领域的可用工具则最为匮乏。 随着长读长测序技术在基因组学研究中的应用持续拓展,long-read-tools.org 数据库已成为一项核心资源,可帮助研究人员及时掌握成熟与新兴的软件工具,进而助力其筛选契合自身分析需求的最优工具。
提供机构:
GigaScience Database
创建时间:
2021-01-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作