agentlans/arcee-ai-The-Tome
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/agentlans/arcee-ai-The-Tome
下载链接
链接失效反馈官方服务:
资源简介:
这是一个非官方版本的The Tome数据集。根据原始作者的描述,The Tome是一个精选的数据集,专门用于训练大型语言模型,特别关注指令跟随能力,并用于训练Arcee-Nova/Spark模型。该数据集经过了去重和洗牌处理,移除了URL、电子邮件和电话号码等敏感信息,并对每一行进行了语言检测。
This is an unofficial version of The Tome dataset. According to the original authors, the Tome is a curated dataset designed for training large language models with a focus on instruction following and it was used to train the Arcee-Nova/Spark models. The dataset has been deduplicated and shuffled, with URLs, e-mails, phone numbers redacted, and language detection performed for each row.
提供机构:
agentlans



