BigMart Retail Sales
收藏Mendeley Data2024-05-10 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/6509955
下载链接
链接失效反馈官方服务:
资源简介:
Nothing ever becomes real till it is experienced. -John Keats While we don't know the context in which John Keats mentioned this, we are sure about its implication in data science. While you would have enjoyed and gained exposure to real world problems in this challenge, here is another opportunity to get your hand dirty with this practice problem. _______________________________________ Problem Statement : The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined. The aim is to build a predictive model and find out the sales of each product at a particular store. Using this model, BigMart will try to understand the properties of products and stores which play a key role in increasing sales. Please note that the data may have missing values as some stores might not report all the data due to technical glitches. Hence, it will be required to treat them accordingly. ________________________________________ Data : We have 14204 samples in data set. Variable Description Item Identifier: A code provided for the item of sale Item Weight: Weight of item Item Fat Content: A categorical column of how much fat is present in the item: ‘Low Fat’, ‘Regular’, ‘low fat’, ‘LF’, ‘reg’ Item Visibility: Numeric value for how visible the item is Item Type: What category does the item belong to: ‘Dairy’, ‘Soft Drinks’, ‘Meat’, ‘Fruits and Vegetables’, ‘Household’, ‘Baking Goods’, ‘Snack Foods’, ‘Frozen Foods’, ‘Breakfast’, ’Health and Hygiene’, ‘Hard Drinks’, ‘Canned’, ‘Breads’, ‘Starchy Foods’, ‘Others’, ‘Seafood’. Item MRP: The MRP price of item Outlet Identifier: Which outlet was the item sold. This will be categorical column Outlet Establishment Year: Which year was the outlet established Outlet Size: A categorical column to explain size of outlet: ‘Medium’, ‘High’, ‘Small’. Outlet Location Type: A categorical column to describe the location of the outlet: ‘Tier 1’, ‘Tier 2’, ‘Tier 3’ Outlet Type: Categorical column for type of outlet: ‘Supermarket Type1’, ‘Supermarket Type2’, ‘Supermarket Type3’, ‘Grocery Store’ Item Outlet Sales: The number of sales for an item. _________________________________________ Evaluation Metric: We will use the Root Mean Square Error value to judge your response
唯有亲历,方得真实。——约翰·济慈(John Keats)
尽管我们无从知晓约翰·济慈此番言论的具体语境,但其在数据科学领域的适配意义毋庸置疑。在本次挑战赛中,你已得以接触并实践真实业务场景下的问题,而本次练习项目则为你提供了又一次实操历练的契机。
## 问题说明
BigMart的数据科学家团队已收集2013年覆盖10家不同城市门店、1559款商品的销售数据,并定义了每款商品与每家门店的若干属性。本次任务的目标为构建预测模型,以预测特定门店中各款商品的销售额。借助该模型,BigMart将得以解析对销售额增长起到关键作用的商品与门店属性。
请注意,由于部分门店可能因技术故障未能完整上报数据,数据集可能存在缺失值,因此需对其进行恰当处理。
## 数据集概况
本数据集共包含14204条样本,各字段详细说明如下:
1. **商品标识符(Item Identifier)**:用于标识在售商品的唯一编码
2. **商品重量(Item Weight)**:商品的重量数值
3. **脂肪含量(Item Fat Content)**:分类字段,用于描述商品的脂肪含量水平,可选取值包括:「Low Fat(低脂)」、「Regular(常规)」、「low fat(低脂)」、「LF(低脂缩写)」、「reg(常规缩写)」
4. **商品可见度(Item Visibility)**:数值型字段,表征商品的陈列可见度
5. **商品类别(Item Type)**:分类字段,用于描述商品所属品类,可选取值包括:「Dairy(乳制品)」、「Soft Drinks(软饮)」、「Meat(肉类)」、「Fruits and Vegetables(果蔬)」、「Household(家居用品)」、「Baking Goods(烘焙食品)」、「Snack Foods(零食)」、「Frozen Foods(冷冻食品)」、「Breakfast(早餐食品)」、「Health and Hygiene(健康洗护)」、「Hard Drinks(烈酒)」、「Canned(罐装食品)」、「Breads(面包)」、「Starchy Foods(淀粉类食品)」、「Others(其他)」、「Seafood(海鲜)」
6. **商品最高零售价(Item MRP)**:商品的官方最高零售价(MRP)
7. **门店标识符(Outlet Identifier)**:商品售出门店的编码,为分类字段
8. **门店开业年份(Outlet Establishment Year)**:门店正式开业的年份
9. **门店规模(Outlet Size)**:分类字段,用于描述门店的经营规模,可选取值包括:「Medium(中型)」、「High(大型)」、「Small(小型)」
10. **门店区位类型(Outlet Location Type)**:分类字段,用于描述门店所在的城市层级,可选取值包括:「Tier 1(一线层级)」、「Tier 2(二线层级)」、「Tier 3(三线层级)」
11. **门店类型(Outlet Type)**:分类字段,用于描述门店的经营业态,可选取值包括:「Supermarket Type1(超市类型1)」、「Supermarket Type2(超市类型2)」、「Supermarket Type3(超市类型3)」、「Grocery Store(杂货店)」
12. **商品-门店销售额(Item Outlet Sales)**:对应商品在指定门店的实际销售额数值
## 评估标准
本次评测将采用均方根误差(Root Mean Square Error, RMSE)作为模型性能的评判依据。
创建时间:
2023-06-28



