Leaderboard Spanish Language Benchmark for Artificial Intelligence Models (TELEIA)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13643393
下载链接
链接失效反馈官方服务:
资源简介:
TELEIA Datasets Leaderboard
These dataset contains the answers of different LLMs to the TELEIA (Spanish Language Benchmark for Artificial Intelligence Models) dataset.LLMs evaluated:
Yi-6B-Chat
Meta-Llama-3-8B-Instruct
Llama-2-7b-chat-hf
gemma-7b-it
Mistral-7B-Instruct-v0.1
occiglot-7b-es-en-instruct
GPT3.5
GPT4
Files:
TELEIA_Cervantes_AVE_results.xlsx: vocabulary and grammatical structures, following the format of the Cervantes AVE exam
TELEIA_PCE_results.xlsx: test on morphology and semantics resembling the style of the PCE exam, consisting of short questions or sentences to be completed
TELEIA_SIELE_results.xlsx: different texts with questions related to them, based on the reading comprehension task of the SIELE exam
Each .xlsx contains a sheet with the results of each model and the following columns:
question: question from TELEIA
option_a: possible answer from TELEIA
option_b: possible answer from TELEIA
option_c: possible answer from TELEIA
option_d: possible answer from TELEIA
correct_answer: correct answer form TELEIA
llm_question: complete question made to the LLM
tokens_in: list of tokens that compound the question
tokens_in_count: number of tokens that compound the question
llm_answer: raw answer from the LLM
llm_answer_filtered: answer in format {A,B,C,D} from the LLM
tokens_out : list of tokens that compound the raw answer
tokens_out_count: number of tokens that compound the raw answer
word_count : number of words that compound the raw answer
创建时间:
2024-09-03



