CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data

Name: CASSIA: a multi-agent large language model for reference free, interpretable, and automated cell annotation of single-cell RNA-sequencing data
Creator: figshare
Published: 2026-01-13 08:02:58
License: 暂无描述

DataCite Commons2026-01-13 更新2026-02-09 收录

下载链接：

https://springernature.figshare.com/articles/dataset/CASSIA_a_multi-agent_large_language_model_for_reference_free_interpretable_and_automated_cell_annotation_of_single-cell_RNA-sequencing_data/28552268

下载链接

链接失效反馈

官方服务：

资源简介：

Cell type annotation is an essential step in single-cell RNA-sequencing analysis, and numerous annotation methods are available. Most require a combination of computational and domain-specific expertise, and they frequently yield inconsistent results that can be challenging to interpret. Large language models have the potential to expand accessibility while reducing manual input and improving accuracy, but existing approaches suffer from hyperconfidence, hallucinations, and lack of reasoning. To address these limitations, we developed CASSIA for automated, accurate, and interpretable cell annotation of single-cell RNA-sequencing data. As demonstrated in numerous case studies, CASSIA improves annotation accuracy in over 970 cell types including those from complex and rare cell populations, and also provides users with reasoning and quality assessment to ensure interpretability, guard against hallucinations, and calibrate confidence.

细胞类型注释是单细胞RNA测序（single-cell RNA-sequencing, scRNA-seq）分析中的核心步骤，目前已存在多种注释方法。多数方法需要结合计算技术与特定领域专业知识，且常产出不一致的结果，其解读难度较高。大语言模型（Large Language Model, LLM）有望提升此类分析的易用性、降低人工投入并提高注释准确性，但现有相关方法存在过度自信、幻觉生成以及推理能力缺失等弊端。为解决上述局限，我们开发了CASSIA工具，用于实现单细胞RNA测序数据的自动化、精准且可解释的细胞类型注释。多项案例研究表明，CASSIA可在涵盖复杂与稀有细胞群在内的970余种细胞类型中提升注释准确性，同时还能为用户提供推理过程与质量评估结果，以保障注释的可解释性、防范幻觉生成并校准置信度。

提供机构：

figshare

创建时间：

2025-03-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集