Reubencf/marathi-czech-sentences

Name: Reubencf/marathi-czech-sentences
Creator: Reubencf
Published: 2026-04-24 09:38:11
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Reubencf/marathi-czech-sentences

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为marathi_czech_sentences，是一个多语言数据集，主要包含马拉地语和捷克语的短句和问题，涵盖各种对话场景。数据集是通过Adaption的Adaptive Data平台重新制作的，原始数据集是Reubencf/low-resource-audio-text。数据集包含3,704个数据点，主要用于指令调优。数据集的最终质量为A级，相对质量提高了206.7%。数据集的语言分布为马拉地语（58%）、捷克语（26%）和匈牙利语（4%）。此外，数据集还涵盖了语言（54%）、其他（22%）和健身运动（6%）等领域，语气分布为非正式（50%）、戏剧性（12%）和帮助性（8%）。

This dataset is named marathi_czech_sentences and is a multilingual collection primarily containing short sentences and questions in Marathi and Czech, covering various conversational contexts. The dataset is a remastered version prepared using Adaptions Adaptive Data platform, with the original dataset being Reubencf/low-resource-audio-text. It contains 3,704 data points and is primarily used for instruction tuning. The final quality of the dataset is grade A, with a relative quality improvement of 206.7%. The language distribution is Marathi (58%), Czech (26%), and Hungarian (4%). Additionally, the dataset covers domains such as Language (54%), Other (22%), and Fitness-sports (6%), with tone distributions being Informal (50%), Dramatic (12%), and Helpful (8%).

提供机构：

Reubencf

5,000+

优质数据集

54 个

任务类型

进入经典数据集