five

ITALIC

收藏
arXiv2023-06-14 更新2024-06-21 收录
下载链接:
https://github.com/RiTA-nlp/ITALIC
下载链接
链接失效反馈
官方服务:
资源简介:
ITALIC是首个大规模意大利语意图分类音频数据集,由都灵理工大学等机构创建。该数据集包含16,521个来自70名不同意大利地区说话者的音频样本,涵盖18个领域和60种意图。数据集通过定制网络平台众包收集,每个样本均附有说话者自报的地区、性别、年龄等信息。ITALIC旨在解决现有SLU资源中意大利语数据稀缺的问题,支持意图分类和自动语音识别等任务,为意大利语SLU模型的发展提供关键资源。

ITALIC is the first large-scale Italian intent classification audio dataset, created by institutions including the Politecnico di Torino. This dataset contains 16,521 audio samples from 70 speakers across various Italian regions, covering 18 domains and 60 intents. It was collected through crowdsourcing on a custom-built web platform, and each sample is paired with self-reported speaker information such as region, gender, and age. Designed to address the shortage of Italian language resources in existing Spoken Language Understanding (SLU) resources, ITALIC supports tasks including intent classification and automatic speech recognition, providing a critical resource for the development of Italian SLU models.
提供机构:
都灵理工大学, 意大利
创建时间:
2023-06-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作