five

Table 1_Celline: a flexible tool for one-step retrieval and integrative analysis of public single-cell RNA sequencing data.xlsx

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Table_1_Celline_a_flexible_tool_for_one-step_retrieval_and_integrative_analysis_of_public_single-cell_RNA_sequencing_data_xlsx/30857321
下载链接
链接失效反馈
官方服务:
资源简介:
Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework’s practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.

单细胞RNA测序(single-cell RNA sequencing, scRNA-seq)已催生了快速增长的公共数据集集合,为发育、疾病与治疗研究提供了重要见解。然而,当前研究者仍缺乏能够无缝完成数据检索、预处理、整合与分析的端到端解决方案,因为现有工具仅能覆盖孤立的步骤,且需要手动处理数据登录号、元数据以及被称为批次效应(batch effects)的技术变异问题。本研究中,我们开发了Celline——一款Python软件包,其每一步工作流均可通过单行命令完成执行。Celline可自动从多个公共数据存储库中获取原始单细胞RNA测序数据,并借助大语言模型(Large Language Model, LLM)提取元数据。随后,该工具将多款成熟工具封装为单行命令,其中包括用于双细胞去除的Scrublet、用于质量控制与细胞类型注释的Seurat与Scanpy、用于批次校正的Harmony与scVI,以及用于轨迹推断的Slingshot,从而实现无缝整合分析。为验证Celline获取的数据质量以及该整合框架的实际应用价值,我们将其应用于两份分别取自胚胎发育第14.5天与第18天的小鼠大脑皮层数据集。技术验证结果表明,Celline可成功获取数据、标准化元数据,并支持完成一系列标准分析流程:包括剔除低质量细胞、注释11种主要细胞类型、提升整合质量(scIB评分提升0.22)以及完成轨迹分析。综上,Celline可将分散的公共单细胞RNA测序资源转化为统一的、可直接用于分析的数据集,且所需人力投入极低。其模块化设计支持工作流扩展,能够推动社区驱动的技术进步,并加速单细胞数据相关研究的发现进程。
创建时间:
2025-12-11
二维码
社区交流群
二维码
科研交流群
商业服务