five

Creating a Large-Scale Audio-Aligned Parsed Corpus of Bilingual Russian Child and Child-Directed Speech (BiRCh): Challenges, Solutions, and Implications for Research

收藏
DataCite Commons2022-10-29 更新2024-07-29 收录
下载链接:
https://scielo.figshare.com/articles/dataset/Creating_a_Large-Scale_Audio-Aligned_Parsed_Corpus_of_Bilingual_Russian_Child_and_Child-Directed_Speech_BiRCh_Challenges_Solutions_and_Implications_for_Research/21431332/1
下载链接
链接失效反馈
官方服务:
资源简介:
ABSTRACT The BiRCh Project (The Corpus of Bilingual Russian Child Speech) involves collecting a longitudinal audio corpus of Russian spoken by children and their families in Russia, Ukraine, Germany, the U.S., and Canada. We are building a large-scale corpus based on a subset of this data, the “Parsed and Audio-aligned Corpus of Bilingual Russian Child and Child-directed Speech (BiRCh)” with two basic components: (1) 1-million-word transcripts which are time-aligned with the audio speech signal and fully textsearchable, and (2) a 500K-word morphologically annotated and parsed portion of the transcripts, also audio-aligned. We are using this corpus to investigate various phenomena in the linguistic input and the developmental trajectory of heritage bilinguals, e.g., case, gender, passives, impersonals, politeness markers, disfluencies, and discourse markers. This article focuses on the challenges and solutions of the BiRCh development and the implications for research on the richly annotated data provided by the corpus.
提供机构:
SciELO journals
创建时间:
2022-10-29
二维码
社区交流群
二维码
科研交流群
商业服务