Synthetic Retail Dataset Machine Learning Evaluation

Name: Synthetic Retail Dataset Machine Learning Evaluation
Creator: IEEE DataPort
Published: 2023-07-23 20:56:53
License: 暂无描述

DataCite Commons2023-07-23 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/synthetic-retail-dataset-machine-learning-evaluation

下载链接

链接失效反馈

官方服务：

资源简介：

This study delves into the creation of a synthetic dataset, designed to emulate real-world retail scenarios for the purpose of machine learning (ML) evaluation. Utilizing Python, the dataset was generated with 39,500 unique product identities, with a total of approximately 70,000 samples. These samples were distributed across the product identities based on a chi-square distribution. Each product was assigned a set of attributes, including weight, aisle and shelf numbers, and a restocking threshold. In addition, the dataset incorporated the time elapsed since the last restocking for each product, providing a more comprehensive view of the retail environment.The preprocessing stage was a critical part of the dataset preparation. It involved feature engineering, where new variables were introduced to the dataset. These variables included a binary indicator of whether a product's weight on the shelf is below a certain threshold and the time elapsed since the last restock. These new features were designed to enhance the performance of the ML models by providing additional, relevant information.The initial dataset exhibited an imbalance with respect to the 'need_restock' label. To address this issue, the sklearn resample utility was used to undersample the majority class, aligning it with the minority class count. This process resulted in a balanced dataset, with each class containing 35,040 samples. The dataset was then randomized to ensure diversity and prevent any potential bias in the ML model evaluation.

提供机构：

IEEE DataPort

创建时间：

2023-07-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集