ML-Projects-Kiel/tweetyface_debug
收藏Hugging Face2022-12-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ML-Projects-Kiel/tweetyface_debug
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- machine-generated
language:
- en
- de
language_creators:
- crowdsourced
license:
- apache-2.0
multilinguality:
- multilingual
pretty_name: tweetyface_debug
size_categories:
- 10K<n<100K
source_datasets: []
tags: []
task_categories:
- text-generation
task_ids: []
---
# DEBUG Dataset Card for "tweetyface"
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**
- **Repository:** [GitHub](https://github.com/ml-projects-kiel/OpenCampus-ApplicationofTransformers)
### Dataset Summary
DEBUG
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
English, German
## Dataset Structure
### Data Instances
#### english
- **Size of downloaded dataset files:** 4.77 MB
- **Size of the generated dataset:** 5.92 MB
- **Total amount of disk used:** 4.77 MB
#### german
- **Size of downloaded dataset files:** 2.58 MB
- **Size of the generated dataset:** 3.10 MB
- **Total amount of disk used:** 2.59 MB
An example of 'validation' looks as follows.
```
{
"text": "@SpaceX @Space_Station About twice as much useful mass to orbit as rest of Earth combined",
"label": elonmusk,
"idx": 1001283
}
```
### Data Fields
The data fields are the same among all splits and languages.
- `text`: a `string` feature.
- `label`: a classification label
- `idx`: an `string` feature.
- `ref_tweet`: a `bool` feature.
- `reply_tweet`: a `bool` feature.
### Data Splits
| name | train | validation |
| ------- | ----: | ---------: |
| english | 27857 | 6965 |
| german | 10254 | 2564 |
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
提供机构:
ML-Projects-Kiel
原始信息汇总
数据集概述
数据集名称
- 名称: tweetyface_debug
语言
- 语言: 英语, 德语
- 语言创建者: 众包
许可证
- 许可证: Apache-2.0
多语言性
- 多语言性: 多语言
大小分类
- 大小分类: 10K<n<100K
任务类别
- 任务类别: 文本生成
数据集结构
数据实例
- 英语:
- 下载数据集文件大小: 4.77 MB
- 生成数据集大小: 5.92 MB
- 总磁盘使用量: 4.77 MB
- 德语:
- 下载数据集文件大小: 2.58 MB
- 生成数据集大小: 3.10 MB
- 总磁盘使用量: 2.59 MB
数据字段
text: 字符串类型label: 分类标签idx: 字符串类型ref_tweet: 布尔类型reply_tweet: 布尔类型
数据分割
- 英语:
- 训练集: 27857
- 验证集: 6965
- 德语:
- 训练集: 10254
- 验证集: 2564



