five

silk-road/Haruhi-Zero

收藏
Hugging Face2024-02-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/silk-road/Haruhi-Zero
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 --- # 用于ChatHaruhi-Zero Extend的训练数据 目前还不知道数据规模 知道的话回头会更名为Haruhi-Zero-XXX K 目前只放出每个source的sample,完整的数据将在1.0 模型放出之后发布 主项目链接 https://github.com/LC1332/Chat-Haruhi-Suzumiya 如果有兴趣加入我们的训练请联系chengli.thu@gmail.com 计划加入的数据源 数据源 - [x] 中文小说数据 - [x] erotics小说数据 - [x] ChatHaruhi 52K, (转了message格式) - [x] Chinese 13.2k, 转了message格式) - [x] Waifu-extended 0.2K, 看看方不方便转成message格式,不行就简单的user-AI - [x] Claude-Baize数据 7.2K - [x] PIPPA数据 1.68K - [x] JanitorAI数据 - [ ] PIPPA翻译数据 - [x] RoleLLM 1.6K, 看看方不方便转成message格式,不行就简单的user-AI # 0.2 进一步去掉AI助理的相关数据 # 0.3 增加身份认知数据 # 0.4 增加小说抽取数据 # 0.5 增加PIPPA翻译,小说数据增加profile ## 赞助 求捐助Claude API 求捐助OpenAI企业API 求赞助资源计算资源中。。。

--- license: CC-BY-4.0 --- # Training Data for ChatHaruhi-Zero Extend Currently, the total scale of this dataset has not been confirmed. Once confirmed, the dataset will be renamed to Haruhi-Zero-XXX K. Currently, only samples for each data source are publicly released. The full dataset will be made available after the 1.0 version model is launched. Main project link: https://github.com/LC1332/Chat-Haruhi-Suzumiya If you are interested in joining our training effort, please contact chengli.thu@gmail.com. ### Planned Data Sources - [x] Chinese fiction dataset - [x] Erotic fiction dataset - [x] ChatHaruhi 52K (converted to message format) - [x] Chinese 13.2K (converted to message format) - [x] Waifu-extended 0.2K: will convert to message format if feasible; otherwise adopt simple user-AI format - [x] Claude-Baize dataset 7.2K - [x] PIPPA dataset 1.68K - [x] JanitorAI dataset - [ ] PIPPA translated dataset - [x] RoleLLM 1.6K: will convert to message format if feasible; otherwise adopt simple user-AI format ## Version 0.2 Further removed data related to AI assistants. ## Version 0.3 Added identity awareness training data. ## Version 0.4 Added novel extraction training data. ## Version 0.5 Added PIPPA translated dataset, and added profile information to the novel dataset. ## Sponsorship We are seeking donations of Claude API and OpenAI Enterprise API. We are currently seeking sponsorship for computing resources.
提供机构:
silk-road
原始信息汇总

数据集概述

数据集名称

  • 当前名称:Haruhi-Zero-XXX K
  • 待更新名称:待定(将根据数据规模更新)
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作