Supporting Data for: Enhancing code-switching research through comparable corpora: Introducing the El Paso Bilingual Corpus

DataONE2025-06-18 更新2025-06-21 收录

下载链接：

https://search.dataone.org/view/sha256:0dd50bd47dd6560c72f1704d503d0a53c6718d68e12eda492a38bc0e3cb4a0fb

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset description: This dataset contains two data files that the related publication is based on. In particular, the data file Dataset_Diminutives contains in total 1886 diminutive constructions extracted from the Bangor Miami Corpus and the El Paso Bilingual Corpus. These constructions are coded for intralinguistic variables relating to the linguistic properties of both the base and the diminutive marker. The data file Metadata_Conversations_El_Paso_Bilingual_Corpus contains metadata about the conversations in the El Paso Bilingual Corpus. Article Abstract: Research on language contact outcomes, such as code-switching, continues to face theoretical and methodological challenges, particularly due to the difficulty of comparing findings across studies that use divergent data collection methods (Parafita Couto et al., 2021; Toribio, 2017). Accordingly, scholars have emphasized the need for publicly available and comparable bilingual corpora (Deuchar, 2020; Gullberg et al., 2009; Munarriz & Parafita Couto, 2014). This paper introduces the El Paso Bilingual Corpus, a new Spanish-English bilingual corpus recorded in El Paso (TX) in 2022, designed to be methodologically comparable to the Bangor Miami Corpus (Deuchar et al., 2014). The paper is structured in three main sections. First, we review existing Spanish-English corpora and examine the theoretical challenges posed by studies using non-comparable methodologies (Parafita Couto et al., 2021; Toribio, 2017), thereby underscoring the gap addressed by the El Paso Bilingual Corpus. Second, we outline the corpus creation process, discussing participant recruitment, data collection, and transcription, and provide an overview of these data, including participants’ sociolinguistic profiles. Third, to demonstrate the practical value of methodologically aligned corpora, we report a comparative case study on diminutive expressions in the El Paso and Bangor Miami corpora, illustrating how shared collection protocols can elucidate the role of community-specific social factors on bilinguals’ morphosyntactic choices.

数据集说明：本数据集包含相关研究所依托的两份数据文件。其中，Dataset_Diminutives数据文件共包含从班戈迈阿米语料库（Bangor Miami Corpus）与埃尔帕索双语语料库（El Paso Bilingual Corpus）中提取的1886条指小构式，这些构式已针对与词根及指小标记的语言属性相关的语内变量完成编码标注。Metadata_Conversations_El_Paso_Bilingual_Corpus数据文件则包含埃尔帕索双语语料库中对话的元数据。论文摘要：针对语码转换（code-switching）等语言接触后果的研究仍面临理论与方法层面的挑战，尤其是难以在采用不同数据收集方法的同类研究间对比研究结果（Parafita Couto等人，2021；Toribio，2017）。有鉴于此，学界学者均强调需构建可公开获取且具备可比性的双语语料库（Deuchar，2020；Gullberg等人，2009；Munarriz & Parafita Couto，2014）。本文介绍了埃尔帕索双语语料库（El Paso Bilingual Corpus）：这是2022年于得克萨斯州埃尔帕索市录制的全新西班牙语-英语双语语料库，其设计目标是在方法论层面可与班戈迈阿米语料库（Bangor Miami Corpus，Deuchar等人，2014）对标可比。本文主体分为三个主要部分：第一，梳理现有西班牙语-英语双语语料库的研究现状，剖析采用非可比方法论的研究所引发的理论难题（Parafita Couto等人，2021；Toribio，2017），以此点明埃尔帕索双语语料库所填补的研究空白；第二，阐述语料库的构建流程，涵盖受试者招募、数据收集与转录环节，并对这批数据进行整体概述，包括受试者的社会语言学背景特征；第三，为论证方法论统一的语料库的实际应用价值，本文针对埃尔帕索双语语料库与班戈迈阿米语料库中的指小表达开展对比案例研究，阐明统一的数据采集规程如何帮助解析社区特定社会因素对双语使用者形态句法选择的影响作用。

创建时间：

2025-06-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集