Vocabulary tests considered in the experiment.

Figshare2024-12-12 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Vocabulary_tests_considered_in_the_experiment_/28018171

下载链接

链接失效反馈

官方服务：

资源简介：

Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama 2, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect the fundamental linguistic aspects of language understanding. In this paper, we advocate for the revival of vocabulary tests as a valuable tool for assessing LLM performance. We evaluate seven LLMs using two vocabulary test formats across two languages and uncover surprising gaps in their lexical knowledge. These findings shed light on the intricacies of LLM word representations, their learning mechanisms, and performance variations across models and languages. Moreover, the ability to automatically generate and perform vocabulary tests offers new opportunities to expand the approach and provide a more complete picture of LLMs’ language skills.

词汇测试曾是语言模型评估的核心基石，但在当前以Llama 2、Mistral、GPT为代表的大语言模型（Large Language Models，LLMs）研究格局中，却遭到了极大的忽视。尽管当前多数大语言模型评估基准均聚焦于特定任务或领域专属知识，却往往忽略了语言理解的核心语言学维度。本研究倡导重拾词汇测试，将其作为评估大语言模型性能的有效工具。本研究采用两种词汇测试范式，针对两种语言评估了七款大语言模型，揭示了其在词汇知识层面存在的令人意外的认知缺口。这些发现有助于阐明大语言模型词表征的复杂特性、学习机制，以及不同模型与语言间的性能差异。此外，自动生成并开展词汇测试的能力，为拓展该研究路径、全面刻画大语言模型的语言能力提供了新的契机。

创建时间：

2024-12-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集