JoaoGuiAlves/SFT-JoinPorTurgueseCorpora
收藏Hugging Face2025-11-28 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/JoaoGuiAlves/SFT-JoinPorTurgueseCorpora
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
- conversational
language:
- pt
- en
size_categories:
- n<1K
---
# Portuguese SFT Corpora
## Dataset Description
A collection of Portuguese supervised fine-tuning datasets, including translations from OpenAssistant.
### Dataset Structure
This dataset contains translated conversations from the OpenAssistant dataset.
Each entry includes:
- Original English text (`text`)
- Portuguese translation (`text-pt`)
- Conversation metadata (message_id, parent_id, role, etc.)
### Languages
- English (original)
- Portuguese (translation)
### Data Fields
- `message_id`: Unique identifier for the message
- `parent_id`: ID of the parent message (null for root messages)
- `text`: Original English text
- `text-pt`: Portuguese translation
- `role`: Either "prompter" (user) or "assistant" (AI)
- `lang`: Original language code
- Additional metadata fields
### Usage
```python
from datasets import load_dataset
dataset = load_dataset("JoaoGuiAlves/SFT-JoinPorTurgueseCorpora")
```
### Source Data
Translated from [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
### License
MIT License
提供机构:
JoaoGuiAlves



