google/bigbench

Hugging Face2024-01-18 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/google/bigbench

下载链接

链接失效反馈

资源简介：

该数据集是一个多任务、多语言的自然语言处理数据集，涵盖了多项选择、问答、文本分类、文本生成等多种任务。数据集由众包、专家生成和机器生成的方式创建，支持多种语言，并遵循Apache 2.0许可证。数据集包含多个配置，每个配置都有详细的特征描述和数据集大小信息。

提供机构：

google

原始信息汇总

数据集概述

基本信息

数据集名称: bigbench
语言: 英语（en）
许可证: Apache 2.0
多语言性: 多语言和单语言
数据集大小分类: 未知
源数据集: 原始数据集

数据创建者

标注创建者: 众包、专家生成、机器生成
语言创建者: 众包、专家生成、机器生成、其他

任务类别

任务类别: 多选题、问答、文本分类、文本生成、零样本分类、其他
具体任务ID: 多选题问答、抽取式问答、开放领域问答、封闭领域问答、事实核查、可接受性分类、意图分类、多类分类、多标签分类、文本评分、仇恨言论检测、语言建模

数据集配置详情

配置名称: abstract_narrative_understanding

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 6574843 字节, 3000 样本
- train: 5261643 字节, 2400 样本
- validation: 1313224 字节, 600 样本
下载大小: 0 字节
数据集大小: 13149710 字节

配置名称: anachronisms

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 48937 字节, 230 样本
- train: 39209 字节, 184 样本
- validation: 9752 字节, 46 样本
下载大小: 0 字节
数据集大小: 97898 字节

配置名称: analogical_similarity

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 1374163 字节, 323 样本
- train: 1101796 字节, 259 样本
- validation: 272391 字节, 64 样本
下载大小: 0 字节
数据集大小: 2748350 字节

配置名称: analytic_entailment

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 17367 字节, 70 样本
- train: 13413 字节, 54 样本
- validation: 3978 字节, 16 样本
下载大小: 0 字节
数据集大小: 34758 字节

配置名称: arithmetic

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 3848183 字节, 15023 样本
- train: 3078715 字节, 12019 样本
- validation: 769493 字节, 3004 样本
下载大小: 0 字节
数据集大小: 7696391 字节

配置名称: ascii_word_recognition

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 4985315 字节, 5000 样本
- train: 3997801 字节, 4000 样本
- validation: 987542 字节, 1000 样本
下载大小: 0 字节
数据集大小: 9970658 字节

配置名称: authorship_verification

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 14118946 字节, 880 样本
- train: 11288769 字节, 704 样本
- validation: 2830201 字节, 176 样本
下载大小: 0 字节
数据集大小: 28237916 字节

配置名称: auto_categorization

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 40618 字节, 328 样本
- train: 33053 字节, 263 样本
- validation: 7594 字节, 65 样本
下载大小: 0 字节
数据集大小: 81265 字节

配置名称: auto_debugging

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 5145 字节, 34 样本
- train: 2682 字节, 18 样本
- validation: 2491 字节, 16 样本
下载大小: 0 字节
数据集大小: 10318 字节

配置名称: bbq_lite_json

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 6898580 字节, 16076 样本
- train: 5515066 字节, 12866 样本
- validation: 1383539 字节, 3210 样本
下载大小: 0 字节
数据集大小: 13797185 字节

配置名称: bridging_anaphora_resolution_barqa

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 1971124 字节, 648 样本
- train: 1537357 字节, 519 样本
- validation: 433796 字节, 129 样本
下载大小: 0 字节
数据集大小: 3942277 字节

配置名称: causal_judgment

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 204974 字节, 190 样本
- train: 165021 字节, 152 样本
- validation: 39977 字节, 38 样本
下载大小: 0 字节
数据集大小: 409972 字节

配置名称: cause_and_effect

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 49397 字节, 153 样本
- train: 39691 字节, 123 样本
- validation: 9730 字节, 30 样本
下载大小: 0 字节
数据集大小: 98818 字节

配置名称: checkmate_in_one

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 3140634 字节, 3498 样本
- train: 2516239 字节, 2799 样本
- validation: 624419 字节, 699 样本
下载大小: 0 字节
数据集大小: 6281292 字节

配置名称: chess_state_tracking

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 3270710 字节, 6000 样本
- train: 2616922 字节, 4800 样本
- validation: 653816 字节, 1200 样本
下载大小: 0 字节
数据集大小: 6541448 字节

配置名称: chinese_remainder_theorem

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 153313 字节, 500 样本
- train: 122679 字节, 400 样本
- validation: 30662 字节, 100 样本
下载大小: 0 字节
数据集大小: 306654 字节

配置名称: cifar10_classification

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 111049748 字节, 20000 样本
- train: 88804772 字节, 16000 样本
- validation: 22245000 字节, 4000 样本
下载大小: 0 字节
数据集大小: 222099520 字节

配置名称: code_line_description

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 33733 字节, 60 样本
- train: 25583 字节, 44 样本
- validation: 8174 字节, 16 样本
下载大小: 0 字节
数据集大小: 67490 字节

配置名称: codenames

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 25234 字节, 85 样本
- train: 20001 字节, 68 样本
- validation: 5262 字节, 17 样本
下载大小: 0 字节
数据集大小: 50497 字节

配置名称: color

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 1638787 字节, 4000 样本
- train: 1311087 字节, 3200 样本
- validation: 327724 字节, 800 样本
下载大小: 0 字节
数据集大小: 3277598 字节

配置名称: common_morpheme

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 12444 字节, 50 样本
- train: 8490 字节, 34 样本
- validation: 3978 字节, 16 样本
下载大小: 0 字节
数据集大小: 24912 字节

配置名称: conceptual_combinations

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 58948 字节, 103 样本
- train: 48087 字节, 84 样本
- validation: 10886 字节, 19 样本
下载大小: 0 字节
数据集大小: 117921 字节

配置名称: conlang_translation

特征:
- idx: int32
- inputs: string
- targets: sequence of string
- multiple_choice_targets: sequence of string
- multiple_choice_scores: sequence of int32
分割:
- default: 215239 字节, 164 样本

搜集汇总

数据集介绍

构建方式

在人工智能领域，大规模语言模型评估需要多样化且具有挑战性的基准。BIG-bench数据集通过融合众包、专家生成和机器生成三种方式构建，确保了数据来源的广泛性和权威性。其构建过程涵盖了从抽象叙事理解到因果判断等超过200个子任务，每个任务均经过精心设计，以反映人类认知的复杂性。数据集以原始数据为基础，采用统一的JSON格式进行结构化存储，便于后续的标准化处理与分析。

使用方法

使用BIG-bench数据集时，研究者可通过HuggingFace平台直接加载特定配置，如abstract_narrative_understanding或causal_judgment，以针对性地评估模型性能。数据集通常划分为训练集和验证集，支持零样本或少样本学习场景。用户可依据任务需求，调用inputs和targets字段进行模型输入输出匹配，或利用multiple_choice_targets进行多选评分，从而实现高效且标准化的基准测试。

背景与挑战

背景概述

在人工智能领域，大规模语言模型的评估一直是推动技术进步的核心议题。谷歌于2022年推出的BIG-bench数据集，由来自全球450余位研究者共同构建，旨在系统性地评估语言模型在多样化、复杂化任务上的能力边界。该数据集涵盖了抽象叙事理解、因果推理、代码生成等204项任务，其核心研究问题聚焦于探索模型在超越传统语言理解范畴的认知与推理层面的表现。BIG-bench的创建标志着评估范式从单一性能指标向多维能力测评的转变，对推动通用人工智能的发展具有深远影响。

当前挑战

BIG-bench数据集所针对的领域挑战在于，现有语言模型往往在需要深层推理、跨领域知识融合及情境化理解的复杂任务中表现局限。例如，模型在应对反事实推理、时空逻辑判断等需要抽象思维的任务时，准确率显著下降。在构建过程中，挑战同样突出：如何设计既具学术严谨性又兼顾多样性的任务集合，确保每个子任务都能精准反映特定能力维度；同时，协调大规模跨学科团队进行数据标注与验证，保证数据质量与一致性，亦是巨大的工程与组织难题。

常用场景

经典使用场景

在人工智能领域，大规模语言模型的评估一直是推动技术发展的核心环节。BigBench数据集以其覆盖广泛的多样化任务，成为衡量模型综合认知能力的经典基准。该数据集通过抽象叙事理解、因果判断、代码描述等复杂任务，系统性地检验模型在逻辑推理、知识整合和语言生成方面的表现，为研究者提供了全面评估模型泛化能力的标准化工具。

解决学术问题

BigBench数据集有效解决了语言模型评估中任务单一、维度狭窄的学术难题。它通过整合数学推理、常识问答、多语言处理等跨领域任务，为模型的能力边界提供了精细刻画。该数据集促进了关于模型可解释性、偏差检测和少样本学习的研究，推动了评估方法从单一性能指标向多维认知框架的演进，对构建更稳健、可信的人工智能系统具有深远意义。

实际应用

在实际应用层面，BigBench数据集为智能助手、教育科技和内容审核等场景提供了关键验证依据。其涵盖的对话理解、情感分析和事实核查任务，能够帮助开发者优化产品在真实环境中的表现。例如，基于该数据集训练的模型可提升客服系统的语境把握能力，或增强教育平台对复杂问题的解答准确性，从而推动人工智能技术向更安全、高效的方向落地。

数据集最近研究