five

Vocabulary Tests LLMs

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14453972
下载链接
链接失效反馈
官方服务:
资源简介:
Vocabulary evaluation of LLMs This repository contains the results for the different vocabulary tests run on LLM tools/models presented in the paper: "The continued usefulness of vocabulary tests for evaluating large language models" currently published  in PLOS ONE: https://doi.org/10.1371/journal.pone.0308259 The name of the files correspond to the vocabulary tests for which results are presented in Tables 3-6 in the paper (note that questions for the TOEFL test are not public).   In each file, the first column has the question posed to the LLM tool/model, followed by the correct answer. The rest of the columns correspond to the answers of the different LLM tools/models evaluated.  The results and percentages are summarized at the bottom of the file after the last test items. The models evaluated are: Model Link Llama 2 7b   https://huggingface.co/meta-llama/Llama-2-7b-chat Llama 2 13b   https://huggingface.co/meta-llama/Llama-2-13b-chat Llama 2 70b https://huggingface.co/meta-llama/Llama-2-70b-chat Mistral 7b v0.1 https://huggingface.co/mistralai/Mistral-7B-v0.1 GPT 3.5 turbo 0613 https://platform.openai.com/docs/models/gpt-3-5-turbo GPT 4 0613   https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4 Bard   To cite our work: @article{10.1371/journal.pone.0308259, doi = {10.1371/journal.pone.0308259}, author = {Martínez, Gonzalo AND Conde, Javier AND Merino-Gómez, Elena AND Bermúdez-Margaretto, Beatriz AND Hernández, José Alberto AND Reviriego, Pedro AND Brysbaert, Marc}, journal = {PLOS ONE}, publisher = {Public Library of Science}, title = {Establishing vocabulary tests as a benchmark for evaluating large language models}, year = {2024}, month = {12}, volume = {19}, url = {https://doi.org/10.1371/journal.pone.0308259}, pages = {1-17}, number = {12}, }
创建时间:
2024-12-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作