amazon-agi/Amazon-Nova-1.0-Micro-evals
收藏Hugging Face2024-12-14 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/amazon-agi/Amazon-Nova-1.0-Micro-evals
下载链接
链接失效反馈官方服务:
资源简介:
该数据集用于评估Amazon Nova Micro模型在不同基准测试上的表现,包括MMLU、DROP、GPQA、MATH、GSM8K、IFEval、BBH、HumanEval、FinQA和FLORES。评估方法涉及贪婪解码和最大生成长度设置为1600 tokens。每个基准测试都有特定的评估设置,如0-shot或6-shot CoT提示,并报告了相应的准确率或F1分数。
This dataset is used to evaluate the performance of the Amazon Nova Micro model on multiple tasks, including multiple-choice question answering (MMLU), reading comprehension (DROP), question answering (GPQA), math problems (MATH), basic math problems (GSM8K), instruction evaluation (IFEval), complex tasks (BBH), code generation (HumanEval), financial question answering (FinQA), and machine translation (FLORES). Detailed evaluation methods and prompting techniques used for each task, such as 0-shot, 6-shot CoT, etc., are provided, along with corresponding evaluation metrics such as exact match accuracy and f1-score.
提供机构:
amazon-agi



