Tool-Calling Dataset for Drug Discovery (TCDD)
收藏arXiv2025-09-30 收录
下载链接:
https://drive.google.com/file/d/1JthOkIAzuuaajZhgH03e9TfM9KwBHmni/view?usp=sharing
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为TCDD,它模拟了药物发现领域内用户与人工智能系统之间的真实对话情境,涵盖了单一回合、多回合、错误纠正以及记忆池更新等多种场景。此外,该数据集还包含了对话类型和场景的多样化,重点关注工具选择、参数提取以及与参数化记忆池的交互。其规模为2800个指令样本(其中2500个用于训练,300个用于测试)。任务的目的是对大型语言模型进行微调及测试,以应对药物发现工具调用任务。
This dataset, named TCDD, simulates real-world conversational scenarios between users and AI systems in the field of drug discovery, covering various scenarios including single-turn dialogue, multi-turn dialogue, error correction, and memory pool updates. Additionally, the dataset encompasses diverse dialogue types and scenarios, with a focus on tool selection, parameter extraction, and interactions with parameterized memory pools. It comprises 2800 instruction samples, of which 2500 are allocated for training and 300 for testing. The purpose of this task is to fine-tune and test Large Language Models (LLMs) to handle drug discovery tool invocation tasks.



