five

Salman95s/Balochi-Multilingual-dataset

收藏
Hugging Face2024-12-19 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/Salman95s/Balochi-Multilingual-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
Balochi语言数据集是一个全面的资源,旨在支持Balochi语言的大型语言模型训练。该数据集不仅支持基本的翻译任务,还支持完全生成性文本和对话AI功能。数据集包括单语Balochi文本、多语翻译语料库以及各种对话和特定领域的文本,适用于生成性AI、多语翻译、文化洞察和特定领域应用等多种用途。数据集涵盖Balochi、英语、乌尔都语和波斯语,涉及一般对话、文学、新闻、学术文本、技术内容、文化和历史叙述、旅行和教育材料、以及助手和聊天场景等多个领域。数据集大小包括单语和多语对,总计数万条记录,包含对话样本、专业场景和技术术语。

The Balochi Language Dataset is a comprehensive resource designed to support the training of large language models (LLMs) in the Balochi language. It goes beyond basic translation tasks, supporting fully generative text and conversational AI capabilities in Balochi. The dataset includes monolingual Balochi text, multilingual translation corpora, and various conversational and domain-specific texts, enabling diverse use cases such as generative AI, multilingual translation, cultural insights, and domain-specific applications. The dataset covers Balochi, English, Urdu, and Persian, spanning general conversation, literature, news, academic texts, technical content, cultural and historical narratives, travel and educational materials, and assistant and chat scenarios. The dataset size includes monolingual and multilingual pairs totaling tens of thousands of records, containing conversational samples, professional scenarios, and technical terminologies.
提供机构:
Salman95s
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作