lapa-llm/yodas-ukr
收藏Hugging Face2025-10-22 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/lapa-llm/yodas-ukr
下载链接
链接失效反馈官方服务:
资源简介:
YODAS数据集的乌克兰子集,包含人工注释和机器生成的转录。为了不适合大型语言模型预训练,数据集的转录被人为缩短。但是,这种缩短在`uk_000_stitched.parquet`版本中被反转。该数据集旨在支持乌克兰语人工智能的发展,并提高乌克兰语使用者的语言技术可访问性。
Ukrainian subset of the YODAS dataset, containing both human-annotated and machine-generated transcriptions. The transcriptions in the dataset were artificially shortened, which was not suitable for LLM pretraining. However, this shortening has been reversed in the `uk_000_stitched.parquet` version. The dataset was created to support Ukrainian language AI development and improve language technology accessibility for Ukrainian speakers.
提供机构:
lapa-llm



