MtCelesteMa/multiglue
收藏Hugging Face2023-01-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/MtCelesteMa/multiglue
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-classification
size_categories:
- 100K<n<1M
language:
- en
multilinguality:
- monolingual
pretty_name: MultiGLUE
source_datasets:
- extended|glue
language_creators:
- found
annotations_creators:
- found
---
# Dataset Card for MultiGLUE
## Dataset Description
- **Homepage:**
- **Repository:**
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
This dataset is a combination of the cola, mrpc, qnli, qqp, rte, sst2, and wnli subsets of the GLUE dataset. Its intended use is to benchmark language models on multitask binary classification.
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
Like the GLUE dataset, this dataset is in English.
## Dataset Structure
### Data Instances
An example instance looks like this:
```
{
"label": 1,
"task": "cola",
"sentence1": "The sailors rode the breeze clear of the rocks.",
"sentence2": null
}
```
### Data Fields
- `task`: A `string` feature, indicating the GLUE task the instance is from.
- `sentence1`: A `string` feature.
- `sentence2`: A `string` feature.
- `label`: A classification label, either 0 or 1.
### Data Splits
- `train`: 551,282 instances
- `validation`: 48,564 instances
- `test`: 404,183 instances, no classification label (same as GLUE)
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
This dataset is created by combining the cola, mrpc, qnli, qqp, rte, sst2, and wnli subsets of the GLUE dataset.
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
提供机构:
MtCelesteMa
原始信息汇总
数据集卡片 MultiGLUE
数据集描述
数据集概述
该数据集是GLUE数据集的cola、mrpc、qnli、qqp、rte、sst2和wnli子集的组合。其目的是用于在多任务二分类上基准测试语言模型。
支持的任务和排行榜
[更多信息需要]
语言
与GLUE数据集一样,该数据集是英文的。
数据集结构
数据实例
一个示例实例如下: json { "label": 1, "task": "cola", "sentence1": "The sailors rode the breeze clear of the rocks.", "sentence2": null }
数据字段
task: 字符串特征,指示实例来自的GLUE任务。sentence1: 字符串特征。sentence2: 字符串特征。label: 分类标签,0或1。
数据分割
train: 551,282个实例validation: 48,564个实例test: 404,183个实例,无分类标签(与GLUE相同)
数据集创建
数据集来源
初始数据收集和规范化
该数据集是通过组合GLUE数据集的cola、mrpc、qnli、qqp、rte、sst2和wnli子集创建的。
源语言生产者
[更多信息需要]
注释
注释过程
[更多信息需要]
注释者
[更多信息需要]
个人和敏感信息
[更多信息需要]
使用数据集的注意事项
数据集的社会影响
[更多信息需要]
偏见的讨论
[更多信息需要]
其他已知限制
[更多信息需要]
附加信息
数据集策展人
[更多信息需要]
许可信息
[更多信息需要]
引用信息
[更多信息需要]
贡献
[更多信息需要]



