five

mpasila/Alpacazord-V1

收藏
Hugging Face2024-04-09 更新2024-04-19 收录
下载链接:
https://hf-mirror.com/datasets/mpasila/Alpacazord-V1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - en tags: - not-for-all-audiences --- This is a dataset made with these datasets: [WizardLM/WizardLM_evol_instruct_70k](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k), [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned), [Xilabs/PIPPA-alpaca](https://huggingface.co/datasets/Xilabs/PIPPA-alpaca), [TokenBender/roleplay_raw_text](https://huggingface.co/datasets/TokenBender/roleplay_raw_text). # Details on how I cleaned/filtered the dataset: Removed "As an AI assistant" and other AI related phrases or sometimes entire examples. Otherwise I would edit those messages to still make sense without the AI phrase. (For example in the case the instruction says it to imagine something the output would annoyingly use you instead of I when describing something the AI is experiencing/experienced) Removed any unnecessary refusals so that it doesn't refuse things that it could potentially be able to do (by using lang-chain or other similar types of software) As mentioned before removed instances where it describes itself as being programmed or being an AI in anyway. Also replacing "I helped an user" with something different like "I helped a friend" so it doesn't sound as robotic. For some examples I used Mistral Large from their chat to replace an output (or in some cases the 7B Instruct V0.2 model). (I ended up using it fairly frequently, since a lot of the original answers had it talk about things from a perspective of an AI which is not what I want) Removed some duplicate instructions. Some outputs were cut off too early, so I had to fix them using Mistral. At somepoint I got lazy and just decided to remove any instances with "as an AI" type examples without replacing them with something else. Some outputs were remade using ChatGPT 3.5 (because apparently there's a daily limit on Mistral). The reason for removing any traces of AI assistant stuff is so that it does not mention those unless you've clearly specified that in your prompt somewhere. It should make roleplaying a lot better when you don't have to hear "as an AI language model bla bla bla". There may still be some refusals or AI stuff left but I did remove most of them I could find. I will likely update this dataset in the future to do more cleanup if needed.
提供机构:
mpasila
原始信息汇总

数据集概述

数据来源

数据清洗和过滤方法

  • 移除了“As an AI assistant”及其他与AI相关的短语或整个示例。
  • 移除了不必要的拒绝,以避免拒绝本可以执行的任务。
  • 移除了描述自身为编程或AI的实例,并将“I helped an user”替换为“I helped a friend”以减少机械感。
  • 使用Mistral Large或7B Instruct V0.2模型替换部分输出。
  • 移除了一些重复的指令。
  • 修复了一些过早截断的输出。
  • 移除了包含“as an AI”类型的示例,未进行替换。
  • 使用ChatGPT 3.5重新生成部分输出。

数据集目的

  • 移除AI助手的痕迹,以改善角色扮演体验,避免出现“as an AI language model”等表述。

未来更新计划

  • 可能进行更多清理工作,以移除剩余的拒绝或AI相关内容。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作