five

mpasila/LimaRP-augmented-8k-context

收藏
Hugging Face2024-05-10 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/mpasila/LimaRP-augmented-8k-context
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: unknown tags: - not-for-all-audiences size_categories: - n<1K viewer: false --- This [dataset](https://huggingface.co/datasets/grimulkan/LimaRP-augmented/) has been splitted into 8k chunks based on Llama 3 8B's tokenizer. Original dataset: [grimulkan/LimaRP-augmented](https://huggingface.co/datasets/grimulkan/LimaRP-augmented/) # Original dataset card An augmented and further modified version of [LimaRP](https://huggingface.co/datasets/lemonilia/LimaRP) in Fastchat format, modified in the following ways: - The first prompt is modified to add context and simple references to aspects of the conversation (OOC, use of emojis, content), include persona descriptions of the characters involved, scenario descriptions and content tags. - Certain irrelevant tags removed from first prompt (4K, grammarchecked, etc.) - Any placeholders replaced by randomly generated names from [Faker](https://pypi.org/project/Faker/), with proper introductions introduced in the first prompt. - All split conversations were joined to train long-context models (you may need to re-split them to fit in context length if you are not doing this). - The assistant never plays multiple characters and always plays only a single character consistently. The user may play multiple characters, and if this is the case, it is clearly explained in the first prompt.
提供机构:
mpasila
原始信息汇总

数据集概述

基本信息

  • 许可证: 未知
  • 标签: 不适用于所有观众
  • 大小类别: 小于1K
  • 查看器: 不可用

数据集描述

  • 原始数据集: grimulkan/LimaRP-augmented
  • 修改内容:
    • 首个提示增加了对话背景、简单引用、角色描述、场景描述和内容标签。
    • 移除了首个提示中的一些无关标签。
    • 所有占位符被随机生成的名字替换,这些名字来自Faker,并在首个提示中引入适当的介绍。
    • 所有分割的对话被合并以训练长上下文模型。
    • 助手始终只扮演一个角色,而用户可能扮演多个角色,并在首个提示中明确说明。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作