Vocabulary Tests LLMs
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14453972
下载链接
链接失效反馈官方服务:
资源简介:
Vocabulary evaluation of LLMs
This repository contains the results for the different vocabulary tests run on LLM tools/models presented in the paper: "The continued usefulness of vocabulary tests for evaluating large language models" currently published in PLOS ONE: https://doi.org/10.1371/journal.pone.0308259
The name of the files correspond to the vocabulary tests for which results are presented in Tables 3-6 in the paper (note that questions for the TOEFL test are not public).
In each file, the first column has the question posed to the LLM tool/model, followed by the correct answer. The rest of the columns correspond to the answers of the different LLM tools/models evaluated. The results and percentages are summarized at the bottom of the file after the last test items.
The models evaluated are:
Model
Link
Llama 2 7b
https://huggingface.co/meta-llama/Llama-2-7b-chat
Llama 2 13b
https://huggingface.co/meta-llama/Llama-2-13b-chat
Llama 2 70b
https://huggingface.co/meta-llama/Llama-2-70b-chat
Mistral 7b v0.1
https://huggingface.co/mistralai/Mistral-7B-v0.1
GPT 3.5 turbo 0613
https://platform.openai.com/docs/models/gpt-3-5-turbo
GPT 4 0613
https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4
Bard
To cite our work:
@article{10.1371/journal.pone.0308259,
doi = {10.1371/journal.pone.0308259},
author = {Martínez, Gonzalo AND Conde, Javier AND Merino-Gómez, Elena AND Bermúdez-Margaretto, Beatriz AND Hernández, José Alberto AND Reviriego, Pedro AND Brysbaert, Marc},
journal = {PLOS ONE},
publisher = {Public Library of Science},
title = {Establishing vocabulary tests as a benchmark for evaluating large language models},
year = {2024},
month = {12},
volume = {19},
url = {https://doi.org/10.1371/journal.pone.0308259},
pages = {1-17},
number = {12},
}
创建时间:
2024-12-13



