five

JoaoGuiAlves/SFT-JoinPorTurgueseCorpora

收藏
Hugging Face2025-11-28 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/JoaoGuiAlves/SFT-JoinPorTurgueseCorpora
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation - conversational language: - pt - en size_categories: - n<1K --- # Portuguese SFT Corpora ## Dataset Description A collection of Portuguese supervised fine-tuning datasets, including translations from OpenAssistant. ### Dataset Structure This dataset contains translated conversations from the OpenAssistant dataset. Each entry includes: - Original English text (`text`) - Portuguese translation (`text-pt`) - Conversation metadata (message_id, parent_id, role, etc.) ### Languages - English (original) - Portuguese (translation) ### Data Fields - `message_id`: Unique identifier for the message - `parent_id`: ID of the parent message (null for root messages) - `text`: Original English text - `text-pt`: Portuguese translation - `role`: Either "prompter" (user) or "assistant" (AI) - `lang`: Original language code - Additional metadata fields ### Usage ```python from datasets import load_dataset dataset = load_dataset("JoaoGuiAlves/SFT-JoinPorTurgueseCorpora") ``` ### Source Data Translated from [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) ### License MIT License
提供机构:
JoaoGuiAlves
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作