five

BAS4R: A Multi-Condition Bangla Speech Dataset for Gender-Aware Real and Fake Voice Analysis

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/YVWOAL
下载链接
链接失效反馈
官方服务:
资源简介:
BAS4R is a structured and large-scale Bangla speech dataset developed to support research in replay attack detection and audio spoofing analysis within voice biometric systems. The dataset contains both authentic (real) and systematically manipulated (spoofed) speech recordings collected under controlled and realistic acoustic conditions. The complete dataset comprises 143.88 hours of audio recordings, totaling 120,125 audio files, organized into five major categories: Channel-based: 28,830 files (34.65 hours) Signal Processing-based: 28,830 files (34.53 hours) Effect-based: 28,830 files (34.48 hours) Replay-based: 28,830 files (34.48 hours) Real Data: 4,805 files (5.75 hours) Speech samples were collected from 100 native Bangla speakers (50 male and 50 female) aged 20–26 years, ensuring balanced gender representation and demographic consistency. All recordings were captured in controlled environments and stored in high-quality digital audio format. The dataset follows a structured hierarchical organization separating real and spoofed samples by category and attack condition, facilitating reproducible research. The spoofed data were generated using real signal processing techniques, channel transmission effects, environmental distortions, and replay setups. BAS4R is suitable for research in anti-spoofing systems, speaker verification robustness evaluation, replay attack detection, and deep learning–based audio classification.
创建时间:
2026-02-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作