Vocabulary Tests LLMs

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14453972

下载链接

链接失效反馈

官方服务：

资源简介：

Vocabulary evaluation of LLMs This repository contains the results for the different vocabulary tests run on LLM tools/models presented in the paper: "The continued usefulness of vocabulary tests for evaluating large language models" currently published in PLOS ONE: https://doi.org/10.1371/journal.pone.0308259 The name of the files correspond to the vocabulary tests for which results are presented in Tables 3-6 in the paper (note that questions for the TOEFL test are not public). In each file, the first column has the question posed to the LLM tool/model, followed by the correct answer. The rest of the columns correspond to the answers of the different LLM tools/models evaluated. The results and percentages are summarized at the bottom of the file after the last test items. The models evaluated are: Model Link Llama 2 7b https://huggingface.co/meta-llama/Llama-2-7b-chat Llama 2 13b https://huggingface.co/meta-llama/Llama-2-13b-chat Llama 2 70b https://huggingface.co/meta-llama/Llama-2-70b-chat Mistral 7b v0.1 https://huggingface.co/mistralai/Mistral-7B-v0.1 GPT 3.5 turbo 0613 https://platform.openai.com/docs/models/gpt-3-5-turbo GPT 4 0613 https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4 Bard To cite our work: @article{10.1371/journal.pone.0308259, doi = {10.1371/journal.pone.0308259}, author = {Martínez, Gonzalo AND Conde, Javier AND Merino-Gómez, Elena AND Bermúdez-Margaretto, Beatriz AND Hernández, José Alberto AND Reviriego, Pedro AND Brysbaert, Marc}, journal = {PLOS ONE}, publisher = {Public Library of Science}, title = {Establishing vocabulary tests as a benchmark for evaluating large language models}, year = {2024}, month = {12}, volume = {19}, url = {https://doi.org/10.1371/journal.pone.0308259}, pages = {1-17}, number = {12}, }

创建时间：

2024-12-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集