five

svq

收藏
魔搭社区2026-01-06 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/google/svq
下载链接
链接失效反馈
官方服务:
资源简介:
# Simple Voice Questions Simple Voice Questions (SVQ) is a set of short audio questions recorded in 26 locales across 17 languages under multiple audio conditions. ## Data Collection Speakers were presented with recording instructions specifying the recording environment and text query to be recorded. They recorded using their own phones or tablets under four conditions: - clean: Record in quiet environment - background speech noise: Record while audio from sources like podcasts, talk radio, or YouTube plays on a separate device (e.g., TV, tablet, computer, or another phone) at a normal listening volume, ensuring it is audible in the recording - traffic noise: Record while speaker is a passenger within a moving vehicle. This includes various forms of transport like buses, trains, and cars (where someone else is driving) - media noise: Record while background media (music, TV, movies, etc.) is playing on a separate device (TV, tablet, computer, or phone). The playback volume should be a normal listening level,sufficient to be audible in the recording. In all conditions, speakers were instructed to minimize other background noise (like fans or conversations), hold their phone naturally, avoid extra sounds (like clicks or taps), use wired headphones if applicable, and speak naturally and expressively with emotion. The query’s text comes from validation and test sets of the XTREME-UP’s retrieval and question answering benchmark datasets. The XTREME-UP dataset is a collection of TYDI QA datasets which are question answering datasets covering 11 typologically diverse languages and the professional translation of the cross-lingual open-retrieval question answering (XOR QA) dataset into 23 Indic languages. The audio queries were approximately uniformly recorded across the four environmental conditions. To ensure speaker diversity, we attempted to cap the number of recordings per speaker at 250. This resulted in a total of 700unique speakers. We collected speaker gender information in four classes: female, male, non_binary, and no_answer. In addition, speakers were askedto report their age. ## Splits The audio data in this release is presented as a single, comprehensive collection, rather than being pre-divided into training, validation, or testing subsets. This decision stems directly from the design of the data acquisition process. Specifically, text prompts and recording environments were randomly allocated across the speaker cohort. While this approach promotes a rich variety of conditions, it introduces a complexity for traditional data splitting: creating partitions that ensure no overlap of speakers and no overlap of text material between splits (a common best practice) would lead to a substantial data reduction, estimated at around 40% of the total recordings. The primary goal guiding this release strategy is to maximize the utility and volume of the data available to users. Therefore, to avoid this significant data loss and provide the fullest possible dataset, the data is released in its entirety as an undivided evaluation set. Users intending to train models with this data will need to devise and implement their own splitting strategies, keeping in mind the inherent trade-offs between data volume and strict speaker/text disjointness if they attempt to replicate such conditions.

# 简易语音问答(Simple Voice Questions,SVQ) 简易语音问答(Simple Voice Questions,SVQ)是一套覆盖17种语言、26个地区,在多种音频录制条件下采集的短音频问答语料。 ## 数据采集 录制前会向参与者提供录制指导,明确告知录制环境与待录制的文本查询内容。参与者使用自有手机或平板,在四种条件下完成录制: - 干净环境:在安静环境中进行录制 - 背景人声噪声:在另一设备(如电视、平板、电脑或另一手机)以正常收听音量播放播客、谈话广播或YouTube音频等内容时录制,确保背景噪声可被录制设备收录 - 交通噪声:参与者作为乘客处于移动交通工具内时录制,涵盖公交、火车、私家车(由他人驾驶)等各类交通场景 - 媒体噪声:在另一设备(电视、平板、电脑或手机)播放背景媒体(音乐、电视节目、电影等)时录制,播放音量设置为正常收听水平,确保背景音频可被录制设备收录。 所有录制条件下,参与者均需尽量减少其他背景噪声(如风扇声或交谈声),自然握持手机,避免产生额外声响(如按键点击声),可按需使用有线耳机,并自然且富有情感地进行表达。 查询文本取自XTREME-UP检索与问答基准数据集的验证集与测试集。XTREME-UP数据集包含覆盖11种类型学差异显著语言的TYDI QA问答数据集,以及跨语言开放检索问答(cross-lingual open-retrieval question answering,XOR QA)数据集的专业翻译版本,该翻译版本覆盖23种印度语系语言。 音频查询在四种环境条件下的录制量大致均匀。为保证参与者多样性,我们将每位参与者的最大录制条数设为250,最终共采集到700名独特参与者的语音数据。我们收集了参与者的性别信息,分为四类:女性、男性、非二元性别、未作答;此外还收集了参与者的年龄信息。 ## 数据拆分 本次发布的音频数据为单一完整集合,未预先划分为训练集、验证集或测试子集。这一决策直接源于数据采集流程的设计:具体而言,文本提示与录制环境是在参与者群体中随机分配的。尽管该方式可丰富场景多样性,但也为传统的数据拆分带来了挑战:若要确保拆分集合间无参与者重叠、无文本素材重叠(这是学界通用的最佳实践),则会导致大量数据流失,据估算约占总录制量的40%。 本次发布策略的核心目标是最大化用户可获取的数据效用与体量。因此,为避免这一显著的数据损失并提供尽可能完整的数据集,本次发布的整体数据均作为未拆分的评估集。若用户希望使用该数据训练模型,则需自行设计并实现数据拆分策略;若尝试实现严格的参与者/文本不重叠拆分,则需要权衡数据体量与拆分要求之间的固有矛盾。
提供机构:
maas
创建时间:
2025-05-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作