thomasavare/waste-classification-v2
收藏Hugging Face2023-05-23 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/thomasavare/waste-classification-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
size_categories:
- 10K<n<100K
---
# Dataset Card for Dataset Name
## Dataset Description
- **Homepage:**
- **Repository:**
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
Dataset used to train a language model to do classification on 50 different waste classes.
### Languages
English
## Dataset Structure
### Data Instances
Phrase | Class | Index
-------|-------|-------
"I have this apple phone charger to throw, where should I put it ?" | PHONE CHARGER | 26
"Should I recycle a disposable cup ?" | Plastic Cup | 32
"I have a milk brick" | Tetrapack | 45
### Data Fields
- Phrase
- Class
- Class_index
### Data Splits
train: 12.5K rows
test: 5.38K rows
additional data: 7.24K rows (unseen_phrases.csv)
## Dataset Creation
Manualy with objects and phrases templates.
### Annotations
#### Annotation process
Each object was annotated and then the phrases were annotated according to the object according to its annnotation.
#### Who are the annotators?
Myself
### Personal and Sensitive Information
None
## Considerations for Using the Data
### Social Impact of Dataset
None
### Discussion of Biases
Some classes are more present than others but the dataset is balanced overall. Because it was created using patterns, might not be very robust.
### Other Known Limitations
Repetition of phrase patterns, have to verify performances of model on external phrases for robustness.
提供机构:
thomasavare
原始信息汇总
数据集概述
数据集描述
- 用途: 用于训练语言模型进行50种不同垃圾类别的分类。
- 语言: 英语
数据集结构
数据实例
| 短语 | 类别 | 索引 |
|---|---|---|
| "I have this apple phone charger to throw, where should I put it ?" | PHONE CHARGER | 26 |
| "Should I recycle a disposable cup ?" | Plastic Cup | 32 |
| "I have a milk brick" | Tetrapack | 45 |
数据字段
- Phrase: 短语
- Class: 类别
- Class_index: 类别索引
数据分割
- 训练集: 12.5K行
- 测试集: 5.38K行
- 额外数据: 7.24K行 (unseen_phrases.csv)
数据集创建
注释过程
- 对象注释: 每个对象先被注释。
- 短语注释: 根据对象的注释对短语进行注释。
- 注释者: 数据集创建者本人
个人和敏感信息
- 信息状态: 无
使用数据集的考虑因素
数据集的社会影响
- 影响状态: 无
数据集的偏见
- 偏见描述: 某些类别比其他类别更常见,但整体数据集是平衡的。由于使用模式创建,可能不够健壮。
其他已知限制
- 限制描述: 短语模式重复,需要验证模型在未见过的短语上的性能以评估健壮性。



