mpasila/Alpacazord-V1
收藏Hugging Face2024-04-09 更新2024-04-19 收录
下载链接:
https://hf-mirror.com/datasets/mpasila/Alpacazord-V1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
tags:
- not-for-all-audiences
---
This is a dataset made with these datasets:
[WizardLM/WizardLM_evol_instruct_70k](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k),
[yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned),
[Xilabs/PIPPA-alpaca](https://huggingface.co/datasets/Xilabs/PIPPA-alpaca),
[TokenBender/roleplay_raw_text](https://huggingface.co/datasets/TokenBender/roleplay_raw_text).
# Details on how I cleaned/filtered the dataset:
Removed "As an AI assistant" and other AI related phrases or sometimes entire examples. Otherwise I would edit those messages to still make sense without the AI phrase. (For example in the case the instruction says it to imagine something the output would annoyingly use you instead of I when describing something the AI is experiencing/experienced)
Removed any unnecessary refusals so that it doesn't refuse things that it could potentially be able to do (by using lang-chain or other similar types of software)
As mentioned before removed instances where it describes itself as being programmed or being an AI in anyway. Also replacing "I helped an user" with something different like "I helped a friend" so it doesn't sound as robotic.
For some examples I used Mistral Large from their chat to replace an output (or in some cases the 7B Instruct V0.2 model). (I ended up using it fairly frequently, since a lot of the original answers had it talk about things from a perspective of an AI which is not what I want)
Removed some duplicate instructions.
Some outputs were cut off too early, so I had to fix them using Mistral.
At somepoint I got lazy and just decided to remove any instances with "as an AI" type examples without replacing them with something else.
Some outputs were remade using ChatGPT 3.5 (because apparently there's a daily limit on Mistral).
The reason for removing any traces of AI assistant stuff is so that it does not mention those unless you've clearly specified that in your prompt somewhere. It should make roleplaying a lot better when you don't have to hear "as an AI language model bla bla bla".
There may still be some refusals or AI stuff left but I did remove most of them I could find. I will likely update this dataset in the future to do more cleanup if needed.
提供机构:
mpasila
原始信息汇总
数据集概述
数据来源
- WizardLM/WizardLM_evol_instruct_70k
- yahma/alpaca-cleaned
- Xilabs/PIPPA-alpaca
- TokenBender/roleplay_raw_text
数据清洗和过滤方法
- 移除了“As an AI assistant”及其他与AI相关的短语或整个示例。
- 移除了不必要的拒绝,以避免拒绝本可以执行的任务。
- 移除了描述自身为编程或AI的实例,并将“I helped an user”替换为“I helped a friend”以减少机械感。
- 使用Mistral Large或7B Instruct V0.2模型替换部分输出。
- 移除了一些重复的指令。
- 修复了一些过早截断的输出。
- 移除了包含“as an AI”类型的示例,未进行替换。
- 使用ChatGPT 3.5重新生成部分输出。
数据集目的
- 移除AI助手的痕迹,以改善角色扮演体验,避免出现“as an AI language model”等表述。
未来更新计划
- 可能进行更多清理工作,以移除剩余的拒绝或AI相关内容。



