five

maikezu/iwslt2026-metrics-shared-train-dev

收藏
Hugging Face2025-12-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/maikezu/iwslt2026-metrics-shared-train-dev
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含IWSLT 2026语音翻译质量评估共享任务的训练和开发集,主要用于语音翻译质量估计的研究。任务目标是根据语音样本和系统生成的翻译,估计反映翻译质量的分数。数据集分为三个部分:train(混合了IWSLT 2023、WMT 2024和WMT 2025的人类注释数据)、train_synthetic(包含基于Common Voice的自动注释数据)和dev(包含IWSLT 2025 ACL Talks的人类注释数据)。数据集特性包括音频波形、音频路径、文档ID、源文本、目标文本、语言代码、领域、翻译系统和评分等。

This dataset contains the train and dev sets for the Speech Translation Metrics Shared Task at IWSLT 2026, primarily designed for research in speech translation quality estimation. The goal is to estimate a score that reflects the translation quality given a speech sample and a system-generated translation. The dataset includes three splits: train (a mix of human annotations from IWSLT 2023, WMT 2024, and WMT 2025), train_synthetic (automatically annotated data based on Common Voice), and dev (human annotations from IWSLT 2025 ACL Talks). Features include audio waveform, audio path, document ID, source text, target text, language codes, domain, translation system, and score.
提供机构:
maikezu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作