ContextualAI/LMUnit-llama3.1-70b-evals-results
收藏Hugging Face2025-07-20 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/ContextualAI/LMUnit-llama3.1-70b-evals-results
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了多个子数据集,每个子数据集针对不同的任务进行了配置。特征包括查询、响应、人类评分、模型评分等。数据集适用于测试,并且提供了不同数量的示例和相应的文件大小。具体描述如下:
- BigGenBench: 包含查询、自然单元测试、响应等特征,适用于评估模型性能。
- Flask: 类似于BigGenBench,但有一些特征差异,如包含模型评分和标签。
- InfoBench: 专注于信息检索任务,包含响应、输入、查询等特征。
- LFQA: 针对问答任务,包含多个响应和相关的评分。
- RewardBench2: 专注于评估模型在特定任务上的表现,包含提示、选择、拒绝等特征。
- RewardBenchv1: 类似于RewardBench2,但特征和任务配置有所不同。
The dataset consists of multiple sub-datasets, each with different configurations tailored to specific tasks. Features include query, response, human scores, model scores, etc. The datasets are designed for testing and provide varying numbers of examples along with the corresponding file sizes. Specific descriptions are as follows:
- BigGenBench: Includes features like query, natural unit test, response, etc., suitable for evaluating model performance.
- Flask: Similar to BigGenBench but with some different features such as model scores and labels.
- InfoBench: Focuses on information retrieval tasks, including features like response, input, query, etc.
- LFQA: Tailored for question-answering tasks, including multiple responses and related scores.
- RewardBench2: Focuses on evaluating model performance on specific tasks, including features like prompt, chosen, rejected, etc.
- RewardBenchv1: Similar to RewardBench2 but with different feature and task configurations.
提供机构:
ContextualAI



