silk-road/Haruhi-Zero
收藏Hugging Face2024-02-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/silk-road/Haruhi-Zero
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
---
# 用于ChatHaruhi-Zero Extend的训练数据
目前还不知道数据规模 知道的话回头会更名为Haruhi-Zero-XXX K
目前只放出每个source的sample,完整的数据将在1.0 模型放出之后发布
主项目链接 https://github.com/LC1332/Chat-Haruhi-Suzumiya
如果有兴趣加入我们的训练请联系chengli.thu@gmail.com
计划加入的数据源
数据源
- [x] 中文小说数据
- [x] erotics小说数据
- [x] ChatHaruhi 52K, (转了message格式)
- [x] Chinese 13.2k, 转了message格式)
- [x] Waifu-extended 0.2K, 看看方不方便转成message格式,不行就简单的user-AI
- [x] Claude-Baize数据 7.2K
- [x] PIPPA数据 1.68K
- [x] JanitorAI数据
- [ ] PIPPA翻译数据
- [x] RoleLLM 1.6K, 看看方不方便转成message格式,不行就简单的user-AI
# 0.2
进一步去掉AI助理的相关数据
# 0.3
增加身份认知数据
# 0.4
增加小说抽取数据
# 0.5
增加PIPPA翻译,小说数据增加profile
## 赞助
求捐助Claude API 求捐助OpenAI企业API
求赞助资源计算资源中。。。
---
license: CC-BY-4.0
---
# Training Data for ChatHaruhi-Zero Extend
Currently, the total scale of this dataset has not been confirmed. Once confirmed, the dataset will be renamed to Haruhi-Zero-XXX K.
Currently, only samples for each data source are publicly released. The full dataset will be made available after the 1.0 version model is launched.
Main project link: https://github.com/LC1332/Chat-Haruhi-Suzumiya
If you are interested in joining our training effort, please contact chengli.thu@gmail.com.
### Planned Data Sources
- [x] Chinese fiction dataset
- [x] Erotic fiction dataset
- [x] ChatHaruhi 52K (converted to message format)
- [x] Chinese 13.2K (converted to message format)
- [x] Waifu-extended 0.2K: will convert to message format if feasible; otherwise adopt simple user-AI format
- [x] Claude-Baize dataset 7.2K
- [x] PIPPA dataset 1.68K
- [x] JanitorAI dataset
- [ ] PIPPA translated dataset
- [x] RoleLLM 1.6K: will convert to message format if feasible; otherwise adopt simple user-AI format
## Version 0.2
Further removed data related to AI assistants.
## Version 0.3
Added identity awareness training data.
## Version 0.4
Added novel extraction training data.
## Version 0.5
Added PIPPA translated dataset, and added profile information to the novel dataset.
## Sponsorship
We are seeking donations of Claude API and OpenAI Enterprise API.
We are currently seeking sponsorship for computing resources.
提供机构:
silk-road
原始信息汇总
数据集概述
数据集名称
- 当前名称:Haruhi-Zero-XXX K
- 待更新名称:待定(将根据数据规模更新)



