five

Synthetic Retail Dataset Machine Learning Evaluation

收藏
DataCite Commons2023-07-23 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/synthetic-retail-dataset-machine-learning-evaluation
下载链接
链接失效反馈
官方服务:
资源简介:
This study delves into the creation of a synthetic dataset, designed to emulate real-world retail scenarios for the purpose of machine learning (ML) evaluation. Utilizing Python, the dataset was generated with 39,500 unique product identities, with a total of approximately 70,000 samples. These samples were distributed across the product identities based on a chi-square distribution. Each product was assigned a set of attributes, including weight, aisle and shelf numbers, and a restocking threshold. In addition, the dataset incorporated the time elapsed since the last restocking for each product, providing a more comprehensive view of the retail environment.The preprocessing stage was a critical part of the dataset preparation. It involved feature engineering, where new variables were introduced to the dataset. These variables included a binary indicator of whether a product's weight on the shelf is below a certain threshold and the time elapsed since the last restock. These new features were designed to enhance the performance of the ML models by providing additional, relevant information.The initial dataset exhibited an imbalance with respect to the 'need_restock' label. To address this issue, the sklearn resample utility was used to undersample the majority class, aligning it with the minority class count. This process resulted in a balanced dataset, with each class containing 35,040 samples. The dataset was then randomized to ensure diversity and prevent any potential bias in the ML model evaluation.
提供机构:
IEEE DataPort
创建时间:
2023-07-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作