Replication Data for: Learning to predict - second language perception of reduced multi-word sequences
收藏DataCite Commons2025-03-03 更新2025-04-09 收录
下载链接:
https://dataverse.no/citation?persistentId=doi:10.18710/TE5ZOG
下载链接
链接失效反馈官方服务:
资源简介:
DATASET ABSTRACT This is the data and code from a word-monitoring task, in which advanced learners of English responded to the word 'to' in verb + to-infinitive structures (V-to-Vinf) in English, where 'to' could occur in a full or reduced pronunciation (e.g. "prefer to" [tʊ] or "preferda" [ɾə]). The design of this experiment is replicated from our earlier study with American English native speakers (Lorenz & Tizón-Couto, 2019, see link to paper and dataset below *). We tested the effects of string frequency (V+to) and transitional probability (of 'to' given the V) on the accuracy and speed of recognition of "to" in spoken sentences. These effects were analysed with mixed-effects generalized additive models (GAMM); the code also includes visualisations of these models. The experiment was run with OpenSesame (version 3.2.6 for Mac, see Mathôt et al. 2012). The data include information on frequencies of occurrence of words and bigrams; this was extracted from the Corpus of Contemporary American English (COCA, Davies 2008–). We used R (version 4.3.1, R Core Team 2023) for all data analyses, hence the code can best be replicated in R. *) Lorenz, D. & Tizón-Couto, D. (2019). Chunking or predicting – frequency information and reduction in the perception of multi-word sequences. Cognitive Linguistics, 30(4), 751-784. https://doi.org/10.1515/cog-2017-0138 (the paper); https://doi.org/10.18710/7TSABU (the data)
数据集摘要
本数据集包含一项单词监测任务的相关数据与代码。该任务面向英语高级学习者,要求其识别英语中“动词+带to不定式”结构(V-to-Vinf)内的单词“to”,此时“to”可采用完整发音或弱读形式,例如“prefer to”读作[tʊ],或“preferda”读作[ɾə]。
本实验的设计复刻自我们此前针对美国英语母语者开展的一项研究(Lorenz & Tizón-Couto, 2019,详见下文论文与数据集链接*)。我们探究了字符串频率(V+to结构)以及“动词后接to的过渡概率”对口语句子中“to”识别准确率与反应速度的影响。
本次分析采用混合效应广义加性模型(mixed-effects generalized additive models, GAMM)对上述效应进行检验;配套代码还包含该类模型的可视化实现。本实验借助OpenSesame(Mac版3.2.6,详见Mathôt等, 2012)运行完成。
数据集包含单词与二元组(bigram)的出现频率信息,该类数据提取自当代美国英语语料库(Corpus of Contemporary American English, COCA, Davies 2008–)。所有数据分析均采用R语言(版本4.3.1,R核心开发团队2023)完成,因此相关代码最适配在R环境中复现使用。
* Lorenz, D. & Tizón-Couto, D. (2019). 组块还是预测——多词序列感知中的频率信息与语音弱化. 认知语言学, 30(4), 751-784. https://doi.org/10.1515/cog-2017-0138(论文链接);https://doi.org/10.18710/7TSABU(数据集链接)
提供机构:
DataverseNO
创建时间:
2024-02-09



