thomasavare/waste-classification-v2

Name: thomasavare/waste-classification-v2
Creator: thomasavare
Published: 2023-05-23 14:30:38
License: 暂无描述

Hugging Face2023-05-23 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/thomasavare/waste-classification-v2

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en size_categories: - 10K<n<100K --- # Dataset Card for Dataset Name ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary Dataset used to train a language model to do classification on 50 different waste classes. ### Languages English ## Dataset Structure ### Data Instances Phrase | Class | Index -------|-------|------- "I have this apple phone charger to throw, where should I put it ?" | PHONE CHARGER | 26 "Should I recycle a disposable cup ?" | Plastic Cup | 32 "I have a milk brick" | Tetrapack | 45 ### Data Fields - Phrase - Class - Class_index ### Data Splits train: 12.5K rows test: 5.38K rows additional data: 7.24K rows (unseen_phrases.csv) ## Dataset Creation Manualy with objects and phrases templates. ### Annotations #### Annotation process Each object was annotated and then the phrases were annotated according to the object according to its annnotation. #### Who are the annotators? Myself ### Personal and Sensitive Information None ## Considerations for Using the Data ### Social Impact of Dataset None ### Discussion of Biases Some classes are more present than others but the dataset is balanced overall. Because it was created using patterns, might not be very robust. ### Other Known Limitations Repetition of phrase patterns, have to verify performances of model on external phrases for robustness.

提供机构：

thomasavare

原始信息汇总

数据集概述

数据集描述

用途: 用于训练语言模型进行50种不同垃圾类别的分类。
语言: 英语

数据集结构

数据实例

短语	类别	索引
"I have this apple phone charger to throw, where should I put it ?"	PHONE CHARGER	26
"Should I recycle a disposable cup ?"	Plastic Cup	32
"I have a milk brick"	Tetrapack	45

数据字段

Phrase: 短语
Class: 类别
Class_index: 类别索引

数据分割

训练集: 12.5K行
测试集: 5.38K行
额外数据: 7.24K行 (unseen_phrases.csv)

数据集创建

注释过程

对象注释: 每个对象先被注释。
短语注释: 根据对象的注释对短语进行注释。
注释者: 数据集创建者本人

个人和敏感信息

信息状态: 无

使用数据集的考虑因素

数据集的社会影响

影响状态: 无

数据集的偏见

偏见描述: 某些类别比其他类别更常见，但整体数据集是平衡的。由于使用模式创建，可能不够健壮。

其他已知限制

限制描述: 短语模式重复，需要验证模型在未见过的短语上的性能以评估健壮性。

5,000+

优质数据集

54 个

任务类型

进入经典数据集