krkawzq/SynBioCoT
收藏Hugging Face2025-10-25 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/krkawzq/SynBioCoT
下载链接
链接失效反馈官方服务:
资源简介:
SynBioCoT是一个大规模的数据集,用于训练语言模型理解和推理生物学数据(DNA、细胞表达谱、蛋白质)并进行显式的链式思维推理。该数据集通过文本化和多步骤推理轨迹来教导LLM(大型语言模型)解释原始的组学数据。数据集包含三种配置:默认配置、原始推理轨迹配置和增强推理轨迹配置,适用于细胞注释、差异表达、启动子识别和蛋白质功能预测等任务。
SynBioCoT is a large-scale dataset designed for training language models to understand and reason over biological data (DNA, cell expression profiles, proteins) with explicit Chain-of-Thought reasoning. The dataset teaches LLMs to interpret raw omics data through textification and multi-step reasoning traces. It includes three configurations: default, raw reasoning traces, and enhanced reasoning traces, which are suitable for tasks such as cell annotation, differential expression, promoter identification, and protein function prediction.
提供机构:
krkawzq



