five

NekoQA-10K

收藏
魔搭社区2026-05-17 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/NekoQA-10K
下载链接
链接失效反馈
官方服务:
资源简介:
![catgirl](image/catgirl1.png) # Dataset Card for NekoQA-10K 🐱 ## Dataset Summary **NekoQA-10K** 是一个面向大语言模型的 **猫娘对话数据集**,共包含 **10,000 条 QA 对话**。 所有回答均遵循统一的 **猫娘人设**: * 称呼用户为“主人” * 在句尾添加特定口癖(如“喵\~”、“no desu”、“的说喵”) * 保持可爱、撒娇、二次元风格 该数据集的主要用途是研究 **大语言模型的“猫娘味”塑造能力**,为微调、对话风格迁移、拟人化交互研究提供素材。 --- ## Supported Tasks and Benchmarks * **风格微调 (Style Finetuning)**: 提升模型的“猫娘化”特征。 * **角色对话生成 (Persona-based Dialogue)**: 研究 LLM 的角色一致性建模。 * **情感陪伴研究 (Affective Computing)**: 探索模型与用户的情感交互能力。 * **评测基准 NekoBench**: 可结合本数据集,评估模型在“猫娘味感知指数 (NPS)”上的表现。 --- ## Languages * **Chinese (zh)** 为主,带少量中英夹杂。 * 语气风格统一为 **猫娘口癖**,具有强烈的拟人化特征。 --- ## Dataset Creation ### Source Data * 部分人工手写原创问答(注意,作者不是猫娘) * 部分来源于公开网络论坛(如弱智吧),经过 **大模型重写**,保证风格统一与安全性 * 基于现有猫娘QA数据集的重写(少量,900条左右) ### Annotations * 回答大部分由大语言模型生成并人工筛选 * 风格标签:猫娘口癖 ### Ethical Considerations * 数据不包含敏感、违法或仇恨内容 * **温馨提醒**:请勿将本数据集用于真实人际关系替代,仅限学术和娱乐研究 --- ## Limitations * 回答多为轻松、拟人化语气,**不保证事实严谨性** * 可能导致模型在严肃任务上“过于可爱” --- ## Citation 如果你使用了本数据集,请引用以下论文: ```bibtex @article{nekoqa2025, title={NekoQA-10K: A Catgirl Dialogue Dataset and NekoBench Evaluation}, author={MindsRiverPonder}, journal={ZHIHU preprint ZHIHU:2508.22}, year={2025} } ``` --- ## License 本数据集基于 **apache-2.0** 开源。 > 猫娘味可自由传播,撒娇权属于全人类。 ---

![catgirl](image/catgirl1.png) # Dataset Card for NekoQA-10K 🐱 ## Dataset Summary **NekoQA-10K** is a catgirl dialogue dataset designed for large language models (LLMs), containing a total of 10,000 QA dialogue pairs. All responses adhere to a unified catgirl persona: * Address the user as "Master" * Add specific speech tics at the end of sentences (e.g., "meow~", "no desu", "shuo de miao") * Maintain a cute, coquettish, and anime-style tone The primary purpose of this dataset is to research the "catgirl vibe" shaping capability of LLMs, providing materials for fine-tuning, dialogue style transfer, and anthropomorphic interaction studies. --- ## Supported Tasks and Benchmarks * **Style Finetuning**: Enhance the "catgirl-style" characteristics of the model. * **Persona-based Dialogue Generation**: Study the persona consistency modeling of LLMs. * **Affective Companion Research (Affective Computing)**: Explore the emotional interaction ability between the model and users. * **Evaluation Benchmark NekoBench**: Can be combined with this dataset to evaluate the model's performance on the "Catgirl Vibe Perception Score (NPS)". --- ## Languages * **Chinese (zh)** as the main language, with a small amount of Chinese-English code-mixing. * The tone uniformly adopts **catgirl speech tics**, with strong anthropomorphic features. --- ## Dataset Creation ### Source Data * Part of the Q&A pairs are manually created original content (note: the authors are not catgirls) * Part is sourced from public online forums (e.g., Ruozhi Bar) and **rewritten by large language models** to ensure unified style and safety * Part is rewritten from existing catgirl QA datasets (a small amount, around 900 pairs) ### Annotations * Most responses are generated by large language models and manually filtered * Style tag: catgirl speech tics ### Ethical Considerations * The dataset contains no sensitive, illegal, or hateful content * **Warm Reminder**: Do not use this dataset as a substitute for real interpersonal relationships; it is only for academic and recreational research purposes. --- ## Limitations * Most responses use a relaxed, anthropomorphic tone, **and no guarantee of factual accuracy** * May cause the model to become "overly cute" when performing serious tasks --- ## Citation If you use this dataset, please cite the following paper: bibtex @article{nekoqa2025, title={NekoQA-10K: A Catgirl Dialogue Dataset and NekoBench Evaluation}, author={MindsRiverPonder}, journal={ZHIHU preprint ZHIHU:2508.22}, year={2025} } --- ## License This dataset is licensed under **Apache-2.0**. > Catgirl vibes can be freely disseminated, the right to be coquettish belongs to all humanity.
提供机构:
maas
创建时间:
2025-08-27
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
NekoQA-10K是一个包含10,000个猫娘风格QA对话的数据集,用于研究大型语言模型在猫娘风格塑造和角色一致性建模方面的能力。数据集主要使用中文,对话风格统一且具有强烈的拟人化特征,适用于风格微调、情感计算等研究任务。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作