超市销售数据集
收藏阿里云天池2026-06-03 更新2026-05-16 收录
下载链接:
https://tianchi.aliyun.com/dataset/226664
下载链接
链接失效反馈官方服务:
资源简介:
某超市零食类商品销售的原始明细数据,涵盖商品编号、商品名称、商品类别、销售日期、销售数量、单价、销售金额、销售区域、客户群体 9 个核心业务字段。
零食销售数据的采集采取自动化批量获得、多维度整理、规范化转换这一体系,主要目的就是把散布在各种电子表格(CSV、Excel)里的初始数据加以汇集,创建起标准的数据集。该过程具体的执行步骤如下所示:
数据整合方案是根据“年份月份”把每天从超市POS系统导出的销售CSV文件按年份月份分门别类存入服务器中指定的目录里,并且将运营部门辅助性的Excel文件(包含门店信息、促销计划、节日安排等内容)一同加入到这个统一的存储环境之中。
该连锁机构在某一个地级市有8家门店,主城区有4家,周边区县各有两家,市场分布比较均衡。主营零食品类为膨化食品、糖果巧克力、坚果炒货、饼干糕点和肉干蜜饯五大类共120个品种。
自动化读取:编写 Python 脚本,通过 Pandas 的pd.read_csv()和pd.read_excel()函数批量读取目录下的所有文件,生成基础数据框(DataFrame);
关联融合机制就是用“门店编号”把销售数据和门店基本信息对接起来,“交易时间+品类编码”双重标识担当起了连接销售记录、促销活动、节假日常规信息的功能。
本模块的主要目的就是将数据文件中的数据编码格式统一成UTF-8,然后生成出符合预处理和数据库存储要求的CSV格式输出文件。
Raw detailed sales data of snack products from a supermarket, covering 9 core business fields: product ID, product name, product category, sales date, sales quantity, unit price, sales amount, sales region and customer group.
The collection of snack sales data follows a framework of automated batch acquisition, multi-dimensional organization and standardized conversion, with the primary goal of aggregating raw data scattered across various spreadsheets (CSV, Excel) and establishing a standardized dataset. The specific implementation steps of this process are as follows:
The data integration scheme categorizes daily sales CSV files exported from the supermarket's POS system by year and month, stores them in a designated server directory, and also incorporates auxiliary Excel files from the operations department (including store information, promotion plans, holiday schedules and other relevant content) into this unified storage environment.
This supermarket chain operates 8 stores in a prefecture-level city: 4 located in the urban core and 2 in each of the surrounding counties and districts, achieving a relatively balanced market distribution. Its main snack categories include puffed food, candy and chocolate, nuts and roasted seeds and fruits, biscuits and pastries, as well as dried meat and preserved fruits, totaling 120 product varieties.
Automated reading: Write Python scripts to batch-read all files in the directory using Pandas' pd.read_csv() and pd.read_excel() functions, generating basic DataFrames;
Association and integration mechanism: Use "store ID" to link sales data with basic store information, while the dual identifier of "transaction time + category code" functions to connect sales records, promotional activities and regular holiday-related information.
The main purpose of this module is to unify the data encoding format in all data files to UTF-8, and then generate CSV format output files that meet the requirements of preprocessing and database storage.
提供机构:
阿里云天池
创建时间:
2026-05-15
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集为某超市零食类商品的销售明细数据,包含商品编号、销售日期等9个核心字段,覆盖膨化食品、糖果巧克力等五大类共120个品种。数据通过自动化脚本从POS系统和运营文件中采集并整合,涉及8家门店的销售记录,最终以规范的CSV格式提供。
以上内容由遇见数据集搜集并总结生成



