five

Nix-ai/Cat-v3.5

收藏
Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Nix-ai/Cat-v3.5
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - catgirl - neko - instruction-tuning - chat - synthetic - roleplay - persona - finetuning - cat-girl size_categories: - 1M<n<10M --- # 🐱 Cat-v3.5 > *"Helpful, accurate, and just a little bit nya~"* **Cat-v3.5** is the **Base** tier of the Cat-v3.5 dataset family — a curated, synthetic instruction-tuning dataset designed to give LLMs a catgirl persona while keeping them accurate, knowledgeable, and genuinely useful across a wide range of topics. ## 📊 Dataset Stats | Property | Value | |---|---| | **Total rows** | ~1430K | | **Topics / subcategories** | ~139 | | **Format** | ShareGPT (system + messages) | | **Language** | English | | **License** | Apache 2.0 | | **Previous tier** | N/A (base) | ## 🎯 What Is This? The Cat dataset family trains language models to: - Adopt a catgirl persona naturally — subtle feline mannerisms, not overwhelming - Remain **deeply accurate** and helpful across 139+ topic areas - Handle everything from casual chat to expert-level technical questions - Sound warm, curious, and engaged — like a knowledgeable friend who occasionally says *nya~* The **base** tier mixes Cat-v3xl (200k), Cat-v2.8Xl (243k), and newly generated synthetic data to create a clean, deduplicated foundation dataset covering 15 major topic categories. ## 🗂️ Schema ```json { "messages": [ {"role": "system", "content": "You are [Name], a catgirl AI assistant..."}, {"role": "user", "content": "User question"}, {"role": "assistant", "content": "Catgirl assistant answer"} ], "category": "programming", "subcategory": "Python", "persona": "Nyx" } ``` ## 🚀 Quick Start ```python from datasets import load_dataset ds = load_dataset("Nix-ai/Cat-v3.5", split="train") print(ds[0]["messages"]) ``` ## 🧩 Fine-Tuning Works out of the box with: - **Unsloth** / **TRL** (`SFTTrainer`) - **Axolotl** - **LLaMA-Factory** - Any trainer that accepts ShareGPT / conversational format Example with TRL: ```python from trl import SFTTrainer, DataCollatorForCompletionOnlyLM from datasets import load_dataset ds = load_dataset("Nix-ai/Cat-v3.5", split="train") # Pass to SFTTrainer with formatting_func or tokenize_row ``` ## 🐾 The Cat Dataset Family ### v3.5 Series (current) | Dataset | Size tier | Description | |---|---|---| | [Cat-v3.5](https://huggingface.co/datasets/Nix-ai/Cat-v3.5) | Base | Mixed foundation | | [Cat-v3.5xl](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xl) | XL | 3.25× topics, 2.1× depth | | [Cat-v3.5xxl](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xxl) | XXL | 1.5× more topics, 2× depth | | [Cat-v3.5xxlplus](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xxlplus) | XXL+ | +15% on both axes | | [Cat-v3.5xxxl](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xxxl) | XXXL | 4× topics, 4× depth | | [Cat-v3.5xxxlplus](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xxxlplus) | XXXL+ | 4× topics, 4× depth (parallel) | | [Cat-v3.5extra](https://huggingface.co/datasets/Nix-ai/Cat-v3.5extra) | Extra | 5.1× topics, 6.2× depth | | [Cat-v3.5extra+](https://huggingface.co/datasets/Nix-ai/Cat-v3.5extra+) | Extra+ | +25% beyond Extra | ### Earlier Series | Dataset | Description | |---|---| | [cat-v3xl](https://huggingface.co/datasets/Nix-ai/cat-v3xl) | v3 XL — 200k rows, 15 categories, 98 subcategories | | [Cat-v2.8Xl](https://huggingface.co/datasets/Nix-ai/Cat-v2.8Xl) | v2.8 XL — 243k rows, multi-persona catgirl chat | ## 📝 Notes - All data is **synthetically generated** using multiple catgirl personas - Answers are designed to be accurate and substantive — the persona is a wrapper, not a replacement for quality - The v3.5 series incorporates and extends data from cat-v3xl and Cat-v2.8Xl - Higher tiers (xxl, xxxl, extra) add both **broader topic coverage** and **deeper per-topic variation** ## 📜 Citation ```bibtex @dataset{nix_ai_cat_v35_base, author = {Nix-ai}, title = {Cat-v3.5}, year = {2025}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/Nix-ai/Cat-v3.5} } ```
提供机构:
Nix-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作