kalpalabs/sansad
收藏Hugging Face2025-05-24 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/kalpalabs/sansad
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含从1992年到2024年印度议会18K场会议的录音和转录文本,总计约17K小时,涉及约3.2K位发言人。数据集包含半数的音频有弱标记的时间戳转录文本。适用于说话人识别、鲁棒的语音识别和语音活动检测等应用。转录文本包含大约1亿个印地语和英语词汇。数据集的使用需遵守印度议会的原始使用条款,仅限于非商业性研究和信息性目的。
This dataset contains recordings and transcripts from 18K sessions of the Indian Parliament, totaling 17K hours from 1992-2024, involving approximately 3.2K speakers. Half of the audios have weakly-labeled timestamped transcripts. It is suitable for applications such as speaker diarization, robust speech-to-text, and voice activity detection. The transcripts contain around 100M tokens of Hindi and English. Usage of the dataset must comply with the original terms of use of the Parliament of India, restricted to non-commercial research and informational purposes.
提供机构:
kalpalabs



