Open-LLM-Leaderboard

Name: Open-LLM-Leaderboard
Creator: 穆罕默德·本·扎耶德人工智能大学VILA实验室
Published: 2024-06-12 01:59:47
License: 暂无描述

arXiv2024-06-12 更新2024-06-21 收录

下载链接：

https://github.com/VILA-Lab/Open-LLM-Leaderboard

下载链接

链接失效反馈

官方服务：

资源简介：

Open-LLM-Leaderboard是由穆罕默德·本·扎耶德人工智能大学VILA实验室创建的一个用于评估大型语言模型（LLMs）的新型数据集。该数据集专注于通过开放式问题来评估模型，旨在消除选择偏差和随机猜测问题。数据集通过自动化的粗到细筛选协议和多阶段过滤过程来生成适合开放式回答的问题，并利用GPT-4进行问题筛选和评分。该数据集的应用领域主要集中在LLMs的性能评估和排名，以解决现有评估方法中的偏差和随机性问题。

Open-LLM-Leaderboard is a novel dataset for evaluating Large Language Models (LLMs) developed by the VILA Lab at Mohamed bin Zayed University of Artificial Intelligence. This dataset focuses on evaluating models via open-ended questions, with the goal of eliminating selection bias and random guessing problems. It generates questions suitable for open-ended responses through an automated coarse-to-fine screening protocol and multi-stage filtering process, and employs GPT-4 for question screening and scoring. The main application scenarios of this dataset center on the performance evaluation and ranking of LLMs, so as to address the bias and randomness issues existing in current evaluation methods.

提供机构：

穆罕默德·本·扎耶德人工智能大学VILA实验室

创建时间：

2024-06-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集