five

krkawzq/SynBioCoT

收藏
Hugging Face2025-10-25 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/krkawzq/SynBioCoT
下载链接
链接失效反馈
官方服务:
资源简介:
SynBioCoT是一个大规模的数据集,用于训练语言模型理解和推理生物学数据(DNA、细胞表达谱、蛋白质)并进行显式的链式思维推理。该数据集通过文本化和多步骤推理轨迹来教导LLM(大型语言模型)解释原始的组学数据。数据集包含三种配置:默认配置、原始推理轨迹配置和增强推理轨迹配置,适用于细胞注释、差异表达、启动子识别和蛋白质功能预测等任务。

SynBioCoT is a large-scale dataset designed for training language models to understand and reason over biological data (DNA, cell expression profiles, proteins) with explicit Chain-of-Thought reasoning. The dataset teaches LLMs to interpret raw omics data through textification and multi-step reasoning traces. It includes three configurations: default, raw reasoning traces, and enhanced reasoning traces, which are suitable for tasks such as cell annotation, differential expression, promoter identification, and protein function prediction.
提供机构:
krkawzq
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作