Nix-ai/Cat-v3.5
收藏Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Nix-ai/Cat-v3.5
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- catgirl
- neko
- instruction-tuning
- chat
- synthetic
- roleplay
- persona
- finetuning
- cat-girl
size_categories:
- 1M<n<10M
---
# 🐱 Cat-v3.5
> *"Helpful, accurate, and just a little bit nya~"*
**Cat-v3.5** is the **Base** tier of the Cat-v3.5 dataset family — a curated,
synthetic instruction-tuning dataset designed to give LLMs a catgirl persona while keeping them
accurate, knowledgeable, and genuinely useful across a wide range of topics.
## 📊 Dataset Stats
| Property | Value |
|---|---|
| **Total rows** | ~1430K |
| **Topics / subcategories** | ~139 |
| **Format** | ShareGPT (system + messages) |
| **Language** | English |
| **License** | Apache 2.0 |
| **Previous tier** | N/A (base) |
## 🎯 What Is This?
The Cat dataset family trains language models to:
- Adopt a catgirl persona naturally — subtle feline mannerisms, not overwhelming
- Remain **deeply accurate** and helpful across 139+ topic areas
- Handle everything from casual chat to expert-level technical questions
- Sound warm, curious, and engaged — like a knowledgeable friend who occasionally says *nya~*
The **base** tier mixes Cat-v3xl (200k), Cat-v2.8Xl (243k), and newly generated synthetic data to create a clean, deduplicated foundation dataset covering 15 major topic categories.
## 🗂️ Schema
```json
{
"messages": [
{"role": "system", "content": "You are [Name], a catgirl AI assistant..."},
{"role": "user", "content": "User question"},
{"role": "assistant", "content": "Catgirl assistant answer"}
],
"category": "programming",
"subcategory": "Python",
"persona": "Nyx"
}
```
## 🚀 Quick Start
```python
from datasets import load_dataset
ds = load_dataset("Nix-ai/Cat-v3.5", split="train")
print(ds[0]["messages"])
```
## 🧩 Fine-Tuning
Works out of the box with:
- **Unsloth** / **TRL** (`SFTTrainer`)
- **Axolotl**
- **LLaMA-Factory**
- Any trainer that accepts ShareGPT / conversational format
Example with TRL:
```python
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
from datasets import load_dataset
ds = load_dataset("Nix-ai/Cat-v3.5", split="train")
# Pass to SFTTrainer with formatting_func or tokenize_row
```
## 🐾 The Cat Dataset Family
### v3.5 Series (current)
| Dataset | Size tier | Description |
|---|---|---|
| [Cat-v3.5](https://huggingface.co/datasets/Nix-ai/Cat-v3.5) | Base | Mixed foundation |
| [Cat-v3.5xl](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xl) | XL | 3.25× topics, 2.1× depth |
| [Cat-v3.5xxl](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xxl) | XXL | 1.5× more topics, 2× depth |
| [Cat-v3.5xxlplus](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xxlplus) | XXL+ | +15% on both axes |
| [Cat-v3.5xxxl](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xxxl) | XXXL | 4× topics, 4× depth |
| [Cat-v3.5xxxlplus](https://huggingface.co/datasets/Nix-ai/Cat-v3.5xxxlplus) | XXXL+ | 4× topics, 4× depth (parallel) |
| [Cat-v3.5extra](https://huggingface.co/datasets/Nix-ai/Cat-v3.5extra) | Extra | 5.1× topics, 6.2× depth |
| [Cat-v3.5extra+](https://huggingface.co/datasets/Nix-ai/Cat-v3.5extra+) | Extra+ | +25% beyond Extra |
### Earlier Series
| Dataset | Description |
|---|---|
| [cat-v3xl](https://huggingface.co/datasets/Nix-ai/cat-v3xl) | v3 XL — 200k rows, 15 categories, 98 subcategories |
| [Cat-v2.8Xl](https://huggingface.co/datasets/Nix-ai/Cat-v2.8Xl) | v2.8 XL — 243k rows, multi-persona catgirl chat |
## 📝 Notes
- All data is **synthetically generated** using multiple catgirl personas
- Answers are designed to be accurate and substantive — the persona is a wrapper, not a replacement for quality
- The v3.5 series incorporates and extends data from cat-v3xl and Cat-v2.8Xl
- Higher tiers (xxl, xxxl, extra) add both **broader topic coverage** and **deeper per-topic variation**
## 📜 Citation
```bibtex
@dataset{nix_ai_cat_v35_base,
author = {Nix-ai},
title = {Cat-v3.5},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/Nix-ai/Cat-v3.5}
}
```
提供机构:
Nix-ai



