five

thomasavare/waste-classification-v2

收藏
Hugging Face2023-05-23 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/thomasavare/waste-classification-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en size_categories: - 10K<n<100K --- # Dataset Card for Dataset Name ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary Dataset used to train a language model to do classification on 50 different waste classes. ### Languages English ## Dataset Structure ### Data Instances Phrase | Class | Index -------|-------|------- "I have this apple phone charger to throw, where should I put it ?" | PHONE CHARGER | 26 "Should I recycle a disposable cup ?" | Plastic Cup | 32 "I have a milk brick" | Tetrapack | 45 ### Data Fields - Phrase - Class - Class_index ### Data Splits train: 12.5K rows test: 5.38K rows additional data: 7.24K rows (unseen_phrases.csv) ## Dataset Creation Manualy with objects and phrases templates. ### Annotations #### Annotation process Each object was annotated and then the phrases were annotated according to the object according to its annnotation. #### Who are the annotators? Myself ### Personal and Sensitive Information None ## Considerations for Using the Data ### Social Impact of Dataset None ### Discussion of Biases Some classes are more present than others but the dataset is balanced overall. Because it was created using patterns, might not be very robust. ### Other Known Limitations Repetition of phrase patterns, have to verify performances of model on external phrases for robustness.
提供机构:
thomasavare
原始信息汇总

数据集概述

数据集描述

  • 用途: 用于训练语言模型进行50种不同垃圾类别的分类。
  • 语言: 英语

数据集结构

数据实例

短语 类别 索引
"I have this apple phone charger to throw, where should I put it ?" PHONE CHARGER 26
"Should I recycle a disposable cup ?" Plastic Cup 32
"I have a milk brick" Tetrapack 45

数据字段

  • Phrase: 短语
  • Class: 类别
  • Class_index: 类别索引

数据分割

  • 训练集: 12.5K行
  • 测试集: 5.38K行
  • 额外数据: 7.24K行 (unseen_phrases.csv)

数据集创建

注释过程

  • 对象注释: 每个对象先被注释。
  • 短语注释: 根据对象的注释对短语进行注释。
  • 注释者: 数据集创建者本人

个人和敏感信息

  • 信息状态: 无

使用数据集的考虑因素

数据集的社会影响

  • 影响状态: 无

数据集的偏见

  • 偏见描述: 某些类别比其他类别更常见,但整体数据集是平衡的。由于使用模式创建,可能不够健壮。

其他已知限制

  • 限制描述: 短语模式重复,需要验证模型在未见过的短语上的性能以评估健壮性。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作