guus4324343/Nomi-150M-Chat
收藏Hugging Face2026-01-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/guus4324343/Nomi-150M-Chat
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
multilinguality:
- multilingual
task_categories:
- text-generation
task_ids:
- dialogue-generation
size_categories:
- 1M<n<10M
pretty_name: Nomi-150M-Chat
dataset_type: chat
tags:
- chat
- dialogue
- conversation
- sft
- llm
- instruction-tuning
---
# Nomi-150M-Chat Dataset
**Nomi-150M-Chat** is a large-scale conversational dataset containing **4.1 million multi-turn chat conversations**, created for training and fine-tuning small to mid-sized chat-oriented language models (around **150M parameters**).
The dataset is optimized for **chat-style LLM training** and intentionally contains **no system prompts**. Assistant behavior and personality are expected to be controlled at training or inference time rather than embedded directly in the data.
---
## 📊 Dataset Overview
- **Name:** Nomi-150M-Chat
- **Rows:** 4,104,961
- **Split:** `train`
- **Format:** Parquet (optimized)
- **License:** Apache 2.0
- **Primary Language:** English
- **Additional Languages:** Multilingual (limited portions)
---
## 🧱 Data Structure
Each row contains a list of chat messages in a structured format:
```json
{
"messages": [
{ "role": "user", "content": "..." },
{ "role": "assistant", "content": "..." }
]
}
提供机构:
guus4324343



