Seeing and Hearing a Word: Combining Eye and Ear Is More Efficient than Combining the Parts of a Word

Figshare2016-01-18 更新2026-04-29 收录

下载链接：

https://figshare.com/articles/dataset/_Seeing_and_Hearing_a_Word_Combining_Eye_and_Ear_Is_More_Efficient_than_Combining_the_Parts_of_a_Word_/708864

下载链接

链接失效反馈

官方服务：

资源简介：

To understand why human sensitivity for complex objects is so low, we study how word identification combines eye and ear or parts of a word (features, letters, syllables). Our observers identify printed and spoken words presented concurrently or separately. When researchers measure threshold (energy of the faintest visible or audible signal) they may report either sensitivity (one over the human threshold) or efficiency (ratio of the best possible threshold to the human threshold). When the best possible algorithm identifies an object (like a word) in noise, its threshold is independent of how many parts the object has. But, with human observers, efficiency depends on the task. In some tasks, human observers combine parts efficiently, needing hardly more energy to identify an object with more parts. In other tasks, they combine inefficiently, needing energy nearly proportional to the number of parts, over a 60∶1 range. Whether presented to eye or ear, efficiency for detecting a short sinusoid (tone or grating) with few features is a substantial 20%, while efficiency for identifying a word with many features is merely 1%. Why? We show that the low human sensitivity for words is a cost of combining their many parts. We report a dichotomy between inefficient combining of adjacent features and efficient combining across senses. Joining our results with a survey of the cue-combination literature reveals that cues combine efficiently only if they are perceived as aspects of the same object. Observers give different names to adjacent letters in a word, and combine them inefficiently. Observers give the same name to a word’s image and sound, and combine them efficiently. The brain’s machinery optimally combines only cues that are perceived as originating from the same object. Presumably such cues each find their own way through the brain to arrive at the same object representation.

为探究人类对复杂对象的感知敏感度为何极低，本研究探讨了词汇识别过程中，多感官（视觉与听觉）信息或词汇子成分（特征、字母、音节）的整合机制。实验中，我们让被试对同时或单独呈现的书面与口语词汇进行识别。当研究者测量阈值（threshold，即最微弱的可视或可听信号的能量）时，可通过两种指标表征：一是敏感度（sensitivity，即人类阈值的倒数），二是效率（efficiency，即最优算法阈值与人类阈值的比值）。若采用最优算法在噪声环境中识别某一对象（如词汇），其阈值与该对象的子成分数量无关。但对于人类被试而言，识别效率则依赖于具体任务类型。在部分任务中，人类被试可高效整合子成分，即便待识别对象的子成分更多，所需识别能量也几乎无明显增加；而在另一些任务中，人类被试的子成分整合效率低下，所需识别能量几乎与子成分数量呈正比，该效应的能量差异可达60:1。无论是通过视觉还是听觉通道呈现，仅含少量特征的短正弦波（sinusoid，包括音调与光栅）的检测效率可达可观的20%，而包含大量特征的词汇识别效率仅为1%。这一现象的成因是什么？本研究表明，人类对词汇的感知敏感度偏低，本质是整合其大量子成分所付出的代价。本研究发现了两类整合模式的二分性：相邻特征的整合效率低下，而跨感官的信息整合则更为高效。结合本研究结果与线索组合（cue-combination）领域的文献综述可知，仅当线索被感知为同一对象的不同属性时，二者才会实现高效整合。对于词汇中的相邻字母，被试会将其视为不同的命名单元，因此整合效率低下；而对于同一词汇的视觉与听觉表征，被试会将其归为同一命名单元，因此二者的整合效率更高。大脑的信息加工机制仅会对被感知为源自同一对象的线索进行最优整合。推测而言，这些线索各自通过大脑的不同加工通路，最终汇聚至同一对象表征。

创建时间：

2016-01-18