five

Replication Data for: Less is More: Why All Paradigms are Defective, and Why that is a Good Thing

收藏
DataONE2018-05-25 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:0e449b1fa43b1be1a96fd0d1951d008342f8b4ed24423fcd68976ea2d5f947da
下载链接
链接失效反馈
官方服务:
资源简介:
Only a fraction of lexemes are encountered in all their paradigm forms in any corpus or even in the lifetime of any speaker. This raises a question as to how it is that native speakers confidently produce and comprehend word forms that they have never witnessed. We present the results of an experiment using a recurrent neural network computational learning model. In particular, we compare the model’s production of unencountered forms using two types of training data: full paradigms vs. single word forms for Russian nouns, verbs, and adjectives. In the long run, the model displays better performance when exposed to the more naturalistic training on single word forms, even though the other training data is much larger as it includes full paradigms for each and every word. We discuss why “defective” paradigms may be better for human learners as well. This post contains data and R code for the grammatical profiles of Russian nouns and the correspondence analysis carried out in Section 3 of the article.

在任何语料库中,乃至任一母语使用者的完整语言使用生涯里,仅极少数词位(lexeme)能以其全部词形变化范式形式被接触到。这引出了一个核心学术问题:母语使用者何以能够自信地产出并理解自身从未接触过的词汇形式?本研究展示了一项基于循环神经网络(recurrent neural network)计算学习模型的实验结果。具体而言,我们针对俄语名词、动词与形容词,对比了该模型在两种训练数据设置下对未接触词形的生成表现:完整词形变化范式(paradigm)数据集,与单一单词形式数据集。从长远来看,即便完整范式训练数据的规模更大——其涵盖了每一个单词的全部词形变化范式,当模型接受基于单一单词形式的更贴合真实语言接触场景的训练时,整体表现更为优异。我们还探讨了为何“缺损式”词形变化范式或许同样更适配人类学习者。本数据集包含俄语名词语法特征轮廓的相关数据与R代码,以及本文第3章中开展的对应分析(correspondence analysis)相关内容。
创建时间:
2024-01-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作