twinkle-ai/llama-4-eval-logs-and-scores

Name: twinkle-ai/llama-4-eval-logs-and-scores
Creator: twinkle-ai
Published: 2025-04-09 05:43:43
License: 暂无描述

Hugging Face2025-04-09 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/twinkle-ai/llama-4-eval-logs-and-scores

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集提供了Llama 4模型（包括Scout和Maverick FP8格式）在标准化和可重现设置下的完整评估日志和每个问题的得分。所有评估都是使用Twinkle AI开发的高精度且高效的评估框架Twinkle Eval进行的。数据集包含了随机打乱的多项选择题选项和三次重复试验的平均值，以便于可靠性的分析。这个仓库作为一个透明的结构化档案，记录了模型在不同任务中的表现，每个问题的结果都可以进行分析和验证。

This dataset provides the complete evaluation logs and per-question scores of various Llama 4 models, including Scout and Maverick FP8, tested under a standardized and reproducible setting using Twinkle Eval, a high-precision and efficient benchmark framework developed by Twinkle AI. The benchmark includes shuffled multiple-choice options and repeated trials (3-run average) for reliability. The repository serves as a transparent and structured archive of how the models perform across different tasks, with every questions result available for analysis and verification.

提供机构：

twinkle-ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集