Nilanjan-2002/fashion-second-hand-front-only-rgb
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Nilanjan-2002/fashion-second-hand-front-only-rgb
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: brand
dtype: string
- name: usage
dtype: string
- name: condition
dtype: int64
- name: type
dtype: string
- name: category
dtype: string
- name: price
dtype: string
- name: trend
dtype: string
- name: colors
dtype: string
- name: cut
dtype: string
- name: pattern
dtype: string
- name: season
dtype: string
- name: text
dtype: string
- name: pilling
dtype: int64
- name: damage
dtype: string
- name: stains
dtype: string
- name: holes
dtype: string
- name: smell
dtype: string
- name: material
dtype: string
splits:
- name: train
num_bytes: 6264676808.824
num_examples: 28248
- name: test
num_bytes: 281469483.32
num_examples: 3390
download_size: 2843670317
dataset_size: 6546146292.144
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
license: cc-by-4.0
language:
- en
- sv
pretty_name: second-ha
---
# Clothing Dataset for Second-Hand Fashion
<!-- Provide a quick summary of the dataset. -->
This dataset contains only the front image and labels from **version 3** of the following dataset released on zenodo:
[Clothing Dataset for Second-Hand Fashion](https://zenodo.org/records/13788681)
Three changes were made:
- **Front image**: Only front image is uploaded here. Back and brand image are not.
- **Background removal**: The background from the front image was removed using [BiRefNet](https://huggingface.co/ZhengPeng7/BiRefNet), which only supports up to 1024x1024 images - larger images were resized. The background removal is not perfect - some artifacts remain.
- **Rotation**: The images were rotated to have a vertical orientation. Our internal experiments showed that just re-orienting the images can boost zero-shot performance.
The following contains most of the details copied from the original source at zenodo.
## Code
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("fnauman/fashion-second-hand-front-only-rgb")
# Access the training split
train_dataset = dataset["train"]
# Print basic information
print(f"Dataset size: {len(train_dataset)} images") # 28248
print(f"Features: {train_dataset.features}") # 19
# Access an example
example = train_dataset[0]
image = example["image"]
# # Display the image - notebook
# from IPython.display import display
# display(example["image"])
print(f"Brand: {example['brand']}, Category: {example['category']}")
# Output: Brand: Soc (stadium), Category: Ladies
```
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
The dataset originates from projects focused on the sorting of used clothes within a sorting facility. The primary objective is to classify each garment into one of several categories to determine its ultimate destination: reuse, reuse outside Sweden (export), recycling, repair, remake, or thermal waste.
The dataset has **31,638** clothing items, a massive update from the 3,000 items in version 1. The dataset collection started under the Vinnova funded project "AI for resource-efficient circular fashion" in Spring, 2022 and involves collaboration among three institutions: RISE Research Institutes of Sweden AB, Wargön Innovation AB, and Myrorna AB. The dataset has received further support through the EU project, CISUTAC (cisutac.eu).
- **Data collected by:** [Wargön Innovation AB](https://wargoninnovation.se/), [Myrorna AB](https://www.myrorna.se/)
- **Curation, cleaning and release by**: [RISE Research Institutes of Sweden AB](https://www.ri.se/en)
- **Funded by:** [Vinnova](https://www.vinnova.se/en/p/ai-for-resource-efficient-circular-fashion/), [CISUTAC - EU Horizon](https://www.cisutac.eu/)
- **License:** CC-BY 4.0
### Dataset Sources
<!-- Provide the basic links for the dataset. -->
- **Repository:** [Clothing Dataset for Second-Hand Fashion](https://zenodo.org/records/13788681)
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
- Usage prediction in sorting facilities.
- Detection of attributes that are specific to used or second-hand garments: condition scale (1-5), stains, holes, etc.
<!-- [More Information Needed] -->
<!-- ### Out-of-Scope Use -->
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
<!-- [More Information Needed] -->
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
- The dataset contains 31,638 clothing items, each with a unique item ID in a datetime format. The items are divided into three stations: `station1`, `station2`, and `station3`. The `station1` and `station2` folders contain images and annotations from Wargön Innovation AB, while the `station3` folder contains data from Myrorna AB. Each clothing item has three images and a JSON file containing annotations.
- Three images are provided for each clothing item:
1. Front view.
2. Back view.
3. Brand label close-up. About 4000-5000 brand images are missing because of privacy concerns: people's hands, faces, etc. Some clothing items did not have a brand label to begin with.
- Image resolutions are primarily in two sizes: `1280x720` and `1920x1080`. The background of the images is a table that used a measuring tape prior to January 2023, but later images have a square grid pattern with each square measuring `10x10` cm.
- Each JSON file contains a list of annotations, some of which require nuanced interpretation (see `labels.py` for the options):
- `usage`: Arguably the most critical label, usage indicates the garment's intended pathway. Options include 'Reuse,' 'Repair,' 'Remake,' 'Recycle,' 'Export' (reuse outside Sweden), and 'Energy recovery' (thermal waste). About 99% of the garments fall into the 'Reuse,' 'Export,' or 'Recycle' categories.
- `trend`: This field refers to the general style of the garment, not a time-dependent trend as in some other datasets (e.g., Visuelle 2.0). It might be more accurately labeled as 'style.'
- `material`: Material annotations are mostly based on the readings from a Near Infrared (NIR) scanner and in some cases from the garment's brand label.
- Damage-related attributes include:
- `condition` (1-5 scale, 5 being the best)
- `pilling` (1-5 scale, 5 meaning no pilling)
- `stains`, `holes`, `smell` (each with options 'None,' 'Minor,' 'Major').
Note: 'holes' and 'smell' were introduced after November 17th, 2022, and stains previously only had 'Yes'/'No' options. For `station1` and `station2`, we introduced additional damage location labels to assist in damage detection:
"damageimage": "back",
"damageloc": "bottom left",
"damage": "stain ",
"damage2image": "front",
"damage2loc": "None",
"damage2": "",
"damage3image": "back",
"damage3loc": "bottom right",
"damage3": "stain"
Taken from `labels_2024_04_05_08_47_35.json` file. Additionally, we annotated a few hundred images with bounding box annotations that we aim to release at a later date.
- `comments`: The comments field is mostly empty, but sometimes contains important information about the garment, such as a detailed text description of the damage.
- Whenever possible, ISO standards have been followed to define these attributes on a 1-5 scale (e.g., `pilling`).
- Gold dataset: 100 garments were annotated multiple times by different annotators for **annotator agreement comparisons**. These 100 garments are placed inside a separate folder `test100`.
- The data has been annotated by a group of expert second-hand sorters at Wargön Innovation AB and Myrorna AB.
- Some attributes, such as `price`, should be considered with caution. Many distinct pricing models exist in the second-hand industry:
- Price by weight
- Price by brand and demand (similar to first-hand fashion)
- Generic pricing at a fixed value (e.g., 1 Euro or 10 SEK)
Wargön Innovation AB does not set the prices in practice and their prices are suggestive only (`station1` and `station2`). Myrorna AB (`station3`), in contrast, does resale and sets the prices.
## Citation
Nauman, F. (2024). Clothing Dataset for Second-Hand Fashion (Version 3) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.13788681
提供机构:
Nilanjan-2002



