five

NyU-BU contextually controlled stories Corpus: NUBUC

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4075182
下载链接
链接失效反馈
官方服务:
资源简介:
The success of a language experiment heavily relies on selecting appropriate stimulus materials. This selection process entails a critical trade-off between similarity to ‘real’ language (i.e. external validity) and experimental and analytic control (i.e. internal validity). In order to bridge these conflicting demands, we developed the NyU-BU contextually controlled stories Corpus (NUBUC) of spoken language. The corpus is both naturalistic and experimentally controlled, comprising 16 high-quality recordings of 8 unique stories, spoken both by a female and a male actor. Each story consists of 128 sentences (~2000 words per story) organized around critical keywords, which have been matched along multiple linguistic dimensions. The context surrounding each keyword is also parametrically manipulated, varying prior context (weak/strong), local context (weak/strong) and sentence position (early/late). Here we describe the corpus in detail, including how it compares to and builds on existent corpora. These materials showcase the ability to overcome the apparent dichotomy between control and generalizability, by presenting subjects with carefully curated linguistic materials in a naturalistic listening scenario.
创建时间:
2021-05-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作