LLMs considered in the experiments.

Figshare2024-12-12 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/LLMs_considered_in_the_experiments_/28018168

下载链接

链接失效反馈

官方服务：

资源简介：

Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama 2, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect the fundamental linguistic aspects of language understanding. In this paper, we advocate for the revival of vocabulary tests as a valuable tool for assessing LLM performance. We evaluate seven LLMs using two vocabulary test formats across two languages and uncover surprising gaps in their lexical knowledge. These findings shed light on the intricacies of LLM word representations, their learning mechanisms, and performance variations across models and languages. Moreover, the ability to automatically generate and perform vocabulary tests offers new opportunities to expand the approach and provide a more complete picture of LLMs’ language skills.

创建时间：

2024-12-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集