Vaani

Name: Vaani
Creator: huggingface.co
License: 暂无描述

huggingface.co2025-03-23 收录

下载链接：

https://huggingface.co/datasets/ARTPARK-IISc/Vaani

下载链接

链接失效反馈

官方服务：

资源简介：

Project Vaani, by IISc, Bangalore and ARTPARK, is capturing the true diversity of India’s spoken languages to propel language AI technologies and content for an inclusive Digital India. We expect to create data corpora of over 150,000 hours of speech, part of which will be transcribed in local scripts, while ensuring linguistic, educational, urban-rural, age, and gender diversity (among other potential diversity characteristics). These diligently collected and curated datasets of natural… See the full description on the dataset page: https://huggingface.co/datasets/ARTPARK-IISc/Vaani.

由印度理工学院班加罗尔分校（IISc, Bangalore）和ARTPARK共同发起的Vaani项目，旨在捕捉印度口语语言的丰富多样性，以推动语言人工智能技术和包容性数字印度的内容发展。我们预期将创建超过15万小时的语音数据集，其中部分语音将以当地文字进行转录，同时确保语言、教育背景、城乡、年龄和性别（以及其他潜在的多样性特征）的多样性。这些经过精心收集和整理的自然语言数据集……请访问数据集页面查看完整描述：https://huggingface.co/datasets/ARTPARK-IISc/Vaani。

提供机构：

huggingface.co

5,000+

优质数据集

54 个

任务类型

进入经典数据集