Reubencf/marathi-czech-sentences
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Reubencf/marathi-czech-sentences
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为marathi_czech_sentences,是一个多语言数据集,主要包含马拉地语和捷克语的短句和问题,涵盖各种对话场景。数据集是通过Adaption的Adaptive Data平台重新制作的,原始数据集是Reubencf/low-resource-audio-text。数据集包含3,704个数据点,主要用于指令调优。数据集的最终质量为A级,相对质量提高了206.7%。数据集的语言分布为马拉地语(58%)、捷克语(26%)和匈牙利语(4%)。此外,数据集还涵盖了语言(54%)、其他(22%)和健身运动(6%)等领域,语气分布为非正式(50%)、戏剧性(12%)和帮助性(8%)。
This dataset is named marathi_czech_sentences and is a multilingual collection primarily containing short sentences and questions in Marathi and Czech, covering various conversational contexts. The dataset is a remastered version prepared using Adaptions Adaptive Data platform, with the original dataset being Reubencf/low-resource-audio-text. It contains 3,704 data points and is primarily used for instruction tuning. The final quality of the dataset is grade A, with a relative quality improvement of 206.7%. The language distribution is Marathi (58%), Czech (26%), and Hungarian (4%). Additionally, the dataset covers domains such as Language (54%), Other (22%), and Fitness-sports (6%), with tone distributions being Informal (50%), Dramatic (12%), and Helpful (8%).
提供机构:
Reubencf



