five

Xieyiyiyi/ceshi0119

收藏
Hugging Face2024-01-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Xieyiyiyi/ceshi0119
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: - other language: - en license: - unknown multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - extended|other task_categories: - text-classification - token-classification - question-answering task_ids: - natural-language-inference - word-sense-disambiguation - coreference-resolution - extractive-qa pretty_name: SuperGLUE tags: - superglue - NLU - natural language understanding dataset_info: - config_name: boolq features: - name: question dtype: string - name: passage dtype: string - name: idx dtype: int32 - name: label dtype: class_label: names: '0': 'False' '1': 'True' splits: - name: test num_bytes: 2107997 num_examples: 3245 - name: train num_bytes: 6179206 num_examples: 9427 - name: validation num_bytes: 2118505 num_examples: 3270 download_size: 4118001 dataset_size: 10405708 - config_name: cb features: - name: premise dtype: string - name: hypothesis dtype: string - name: idx dtype: int32 - name: label dtype: class_label: names: '0': entailment '1': contradiction '2': neutral splits: - name: test num_bytes: 93660 num_examples: 250 - name: train num_bytes: 87218 num_examples: 250 - name: validation num_bytes: 21894 num_examples: 56 download_size: 75482 dataset_size: 202772 - config_name: copa features: - name: premise dtype: string - name: choice1 dtype: string - name: choice2 dtype: string - name: question dtype: string - name: idx dtype: int32 - name: label dtype: class_label: names: '0': choice1 '1': choice2 splits: - name: test num_bytes: 60303 num_examples: 500 - name: train num_bytes: 49599 num_examples: 400 - name: validation num_bytes: 12586 num_examples: 100 download_size: 43986 dataset_size: 122488 - config_name: multirc features: - name: paragraph dtype: string - name: question dtype: string - name: answer dtype: string - name: idx struct: - name: paragraph dtype: int32 - name: question dtype: int32 - name: answer dtype: int32 - name: label dtype: class_label: names: '0': 'False' '1': 'True' splits: - name: test num_bytes: 14996451 num_examples: 9693 - name: train num_bytes: 46213579 num_examples: 27243 - name: validation num_bytes: 7758918 num_examples: 4848 download_size: 1116225 dataset_size: 68968948 - config_name: record features: - name: passage dtype: string - name: query dtype: string - name: entities sequence: string - name: entity_spans sequence: - name: text dtype: string - name: start dtype: int32 - name: end dtype: int32 - name: answers sequence: string - name: idx struct: - name: passage dtype: int32 - name: query dtype: int32 splits: - name: train num_bytes: 179232052 num_examples: 100730 - name: validation num_bytes: 17479084 num_examples: 10000 - name: test num_bytes: 17200575 num_examples: 10000 download_size: 51757880 dataset_size: 213911711 - config_name: rte features: - name: premise dtype: string - name: hypothesis dtype: string - name: idx dtype: int32 - name: label dtype: class_label: names: '0': entailment '1': not_entailment splits: - name: test num_bytes: 975799 num_examples: 3000 - name: train num_bytes: 848745 num_examples: 2490 - name: validation num_bytes: 90899 num_examples: 277 download_size: 750920 dataset_size: 1915443 - config_name: wic features: - name: word dtype: string - name: sentence1 dtype: string - name: sentence2 dtype: string - name: start1 dtype: int32 - name: start2 dtype: int32 - name: end1 dtype: int32 - name: end2 dtype: int32 - name: idx dtype: int32 - name: label dtype: class_label: names: '0': 'False' '1': 'True' splits: - name: test num_bytes: 180593 num_examples: 1400 - name: train num_bytes: 665183 num_examples: 5428 - name: validation num_bytes: 82623 num_examples: 638 download_size: 396213 dataset_size: 928399 - config_name: wsc features: - name: text dtype: string - name: span1_index dtype: int32 - name: span2_index dtype: int32 - name: span1_text dtype: string - name: span2_text dtype: string - name: idx dtype: int32 - name: label dtype: class_label: names: '0': 'False' '1': 'True' splits: - name: test num_bytes: 31572 num_examples: 146 - name: train num_bytes: 89883 num_examples: 554 - name: validation num_bytes: 21637 num_examples: 104 download_size: 32751 dataset_size: 143092 - config_name: wsc.fixed features: - name: text dtype: string - name: span1_index dtype: int32 - name: span2_index dtype: int32 - name: span1_text dtype: string - name: span2_text dtype: string - name: idx dtype: int32 - name: label dtype: class_label: names: '0': 'False' '1': 'True' splits: - name: test num_bytes: 31568 num_examples: 146 - name: train num_bytes: 89883 num_examples: 554 - name: validation num_bytes: 21637 num_examples: 104 download_size: 32751 dataset_size: 143088 - config_name: axb features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: idx dtype: int32 - name: label dtype: class_label: names: '0': entailment '1': not_entailment splits: - name: test num_bytes: 238392 num_examples: 1104 download_size: 33950 dataset_size: 238392 - config_name: axg features: - name: premise dtype: string - name: hypothesis dtype: string - name: idx dtype: int32 - name: label dtype: class_label: names: '0': entailment '1': not_entailment splits: - name: test num_bytes: 53581 num_examples: 356 download_size: 10413 dataset_size: 53581 --- # Dataset Card for "super_glue" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [https://github.com/google-research-datasets/boolean-questions](https://github.com/google-research-datasets/boolean-questions) - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Size of downloaded dataset files:** 55.66 MB - **Size of the generated dataset:** 238.01 MB - **Total amount of disk used:** 293.67 MB ### Dataset Summary SuperGLUE (https://super.gluebenchmark.com/) is a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a new public leaderboard. BoolQ (Boolean Questions, Clark et al., 2019a) is a QA task where each example consists of a short passage and a yes/no question about the passage. The questions are provided anonymously and unsolicited by users of the Google search engine, and afterwards paired with a paragraph from a Wikipedia article containing the answer. Following the original work, we evaluate with accuracy. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Dataset Structure ### Data Instances #### axb - **Size of downloaded dataset files:** 0.03 MB - **Size of the generated dataset:** 0.23 MB - **Total amount of disk used:** 0.26 MB An example of 'test' looks as follows. ``` ``` #### axg - **Size of downloaded dataset files:** 0.01 MB - **Size of the generated dataset:** 0.05 MB - **Total amount of disk used:** 0.06 MB An example of 'test' looks as follows. ``` ``` #### boolq - **Size of downloaded dataset files:** 3.93 MB - **Size of the generated dataset:** 9.92 MB - **Total amount of disk used:** 13.85 MB An example of 'train' looks as follows. ``` ``` #### cb - **Size of downloaded dataset files:** 0.07 MB - **Size of the generated dataset:** 0.19 MB - **Total amount of disk used:** 0.27 MB An example of 'train' looks as follows. ``` ``` #### copa - **Size of downloaded dataset files:** 0.04 MB - **Size of the generated dataset:** 0.12 MB - **Total amount of disk used:** 0.16 MB An example of 'train' looks as follows. ``` ``` ### Data Fields The data fields are the same among all splits. #### axb - `sentence1`: a `string` feature. - `sentence2`: a `string` feature. - `idx`: a `int32` feature. - `label`: a classification label, with possible values including `entailment` (0), `not_entailment` (1). #### axg - `premise`: a `string` feature. - `hypothesis`: a `string` feature. - `idx`: a `int32` feature. - `label`: a classification label, with possible values including `entailment` (0), `not_entailment` (1). #### boolq - `question`: a `string` feature. - `passage`: a `string` feature. - `idx`: a `int32` feature. - `label`: a classification label, with possible values including `False` (0), `True` (1). #### cb - `premise`: a `string` feature. - `hypothesis`: a `string` feature. - `idx`: a `int32` feature. - `label`: a classification label, with possible values including `entailment` (0), `contradiction` (1), `neutral` (2). #### copa - `premise`: a `string` feature. - `choice1`: a `string` feature. - `choice2`: a `string` feature. - `question`: a `string` feature. - `idx`: a `int32` feature. - `label`: a classification label, with possible values including `choice1` (0), `choice2` (1). ### Data Splits #### axb | |test| |---|---:| |axb|1104| #### axg | |test| |---|---:| |axg| 356| #### boolq | |train|validation|test| |-----|----:|---------:|---:| |boolq| 9427| 3270|3245| #### cb | |train|validation|test| |---|----:|---------:|---:| |cb | 250| 56| 250| #### copa | |train|validation|test| |----|----:|---------:|---:| |copa| 400| 100| 500| ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Citation Information ``` @inproceedings{clark2019boolq, title={BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions}, author={Clark, Christopher and Lee, Kenton and Chang, Ming-Wei, and Kwiatkowski, Tom and Collins, Michael, and Toutanova, Kristina}, booktitle={NAACL}, year={2019} } @article{wang2019superglue, title={SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems}, author={Wang, Alex and Pruksachatkun, Yada and Nangia, Nikita and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R}, journal={arXiv preprint arXiv:1905.00537}, year={2019} } Note that each SuperGLUE dataset has its own citation. Please see the source to get the correct citation for each contained dataset. ``` ### Contributions Thanks to [@thomwolf](https://github.com/thomwolf), [@lewtun](https://github.com/lewtun), [@patrickvonplaten](https://github.com/patrickvonplaten) for adding this dataset.
提供机构:
Xieyiyiyi
原始信息汇总

数据集概述

基本信息

  • 名称: SuperGLUE
  • 语言: 英语 (en)
  • 许可证: 未知
  • 多语言性: 单语
  • 大小: 10K<n<100K
  • 来源: 扩展自其他数据集
  • 任务类别: 文本分类, 令牌分类, 问答
  • 任务ID: 自然语言推理, 词义消歧, 指代消解, 抽取式问答
  • 美观名称: SuperGLUE
  • 标签: superglue, NLU, 自然语言理解

数据集配置详情

配置: boolq
  • 特征:
    • question: 字符串
    • passage: 字符串
    • idx: 整数 (int32)
    • label: 分类标签 (0: False, 1: True)
  • 分割:
    • train: 9427个样本, 6179206字节
    • validation: 3270个样本, 2118505字节
    • test: 3245个样本, 2107997字节
    • 下载大小: 4118001字节
    • 数据集大小: 10405708字节
配置: cb
  • 特征:
    • premise: 字符串
    • hypothesis: 字符串
    • idx: 整数 (int32)
    • label: 分类标签 (0: entailment, 1: contradiction, 2: neutral)
  • 分割:
    • train: 250个样本, 87218字节
    • validation: 56个样本, 21894字节
    • test: 250个样本, 93660字节
    • 下载大小: 75482字节
    • 数据集大小: 202772字节
配置: copa
  • 特征:
    • premise: 字符串
    • choice1: 字符串
    • choice2: 字符串
    • question: 字符串
    • idx: 整数 (int32)
    • label: 分类标签 (0: choice1, 1: choice2)
  • 分割:
    • train: 400个样本, 49599字节
    • validation: 100个样本, 12586字节
    • test: 500个样本, 60303字节
    • 下载大小: 43986字节
    • 数据集大小: 122488字节
配置: multirc
  • 特征:
    • paragraph: 字符串
    • question: 字符串
    • answer: 字符串
    • idx: 结构化 (paragraph, question, answer 均为整数 int32)
    • label: 分类标签 (0: False, 1: True)
  • 分割:
    • train: 27243个样本, 46213579字节
    • validation: 4848个样本, 7758918字节
    • test: 9693个样本, 14996451字节
    • 下载大小: 1116225字节
    • 数据集大小: 68968948字节
配置: record
  • 特征:
    • passage: 字符串
    • query: 字符串
    • entities: 字符串序列
    • entity_spans: 结构化序列 (text: 字符串, start: 整数 int32, end: 整数 int32)
    • answers: 字符串序列
    • idx: 结构化 (passage, query 均为整数 int32)
  • 分割:
    • train: 100730个样本, 179232052字节
    • validation: 10000个样本, 17479084字节
    • test: 10000个样本, 17200575字节
    • 下载大小: 51757880字节
    • 数据集大小: 213911711字节
配置: rte
  • 特征:
    • premise: 字符串
    • hypothesis: 字符串
    • idx: 整数 (int32)
    • label: 分类标签 (0: entailment, 1: not_entailment)
  • 分割:
    • train: 2490个样本, 848745字节
    • validation: 277个样本, 90899字节
    • test: 3000个样本, 975799字节
    • 下载大小: 750920字节
    • 数据集大小: 1915443字节
配置: wic
  • 特征:
    • word: 字符串
    • sentence1: 字符串
    • sentence2: 字符串
    • start1: 整数 (int32)
    • start2: 整数 (int32)
    • end1: 整数 (int32)
    • end2: 整数 (int32)
    • idx: 整数 (int32)
    • label: 分类标签 (0: False, 1: True)
  • 分割:
    • train: 5428个样本, 665183字节
    • validation: 638个样本, 82623字节
    • test: 1400个样本, 180593字节
    • 下载大小: 396213字节
    • 数据集大小: 928399字节
配置: wsc
  • 特征:
    • text: 字符串
    • span1_index: 整数 (int32)
    • span2_index: 整数 (int32)
    • span1_text: 字符串
    • span2_text: 字符串
    • idx: 整数 (int32)
    • label: 分类标签 (0: False, 1: True)
  • 分割:
    • train: 554个样本, 89883字节
    • validation: 104个样本, 21637字节
    • test: 146个样本, 31572字节
    • 下载大小: 32751字节
    • 数据集大小: 143092字节
配置: wsc.fixed
  • 特征:
    • text: 字符串
    • span1_index: 整数 (int32)
    • span2_index: 整数 (int32)
    • span1_text: 字符串
    • span2_text: 字符串
    • idx: 整数 (int32)
    • label: 分类标签 (0: False, 1: True)
  • 分割:
    • train: 554个样本, 89883字节
    • validation: 104个样本, 21637字节
    • test: 146个样本, 31568字节
    • 下载大小: 32751字节
    • 数据集大小: 143088字节
配置: axb
  • 特征:
    • sentence1: 字符串
    • sentence2: 字符串
    • idx: 整数 (int32)
    • label: 分类标签 (0: entailment, 1: not_entailment)
  • 分割:
    • test: 1104个样本, 238392字节
    • 下载大小: 33950字节
    • 数据集大小: 238392字节
配置: axg
  • 特征:
    • premise: 字符串
    • hypothesis: 字符串
    • idx: 整数 (int32)
    • label: 分类标签 (0: entailment, 1: not_entailment)
  • 分割:
    • test: 356个样本, 53581字节
    • 下载大小: 10413字节
    • 数据集大小: 53581字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作