five

Peter14352/Amazon_Sports_and_Outdoors_2023

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Peter14352/Amazon_Sports_and_Outdoors_2023
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: main_category dtype: string - name: title dtype: string - name: average_rating dtype: float64 - name: rating_number dtype: int64 - name: features dtype: string - name: description dtype: string - name: price dtype: float64 - name: images list: - name: thumb dtype: string - name: large dtype: string - name: variant dtype: string - name: hi_res dtype: string - name: videos list: - name: title dtype: string - name: url dtype: string - name: user_id dtype: string - name: store dtype: string - name: categories sequence: string - name: parent_asin dtype: string - name: date_first_available dtype: int64 - name: manufacturer dtype: string - name: brand_name dtype: string - name: color dtype: string - name: package_weight dtype: string - name: item_package_dimensions_l_x_w_x_h dtype: string - name: part_number dtype: string - name: material dtype: string - name: best_sellers_rank dtype: string - name: size dtype: string - name: style dtype: string - name: brand dtype: string - name: suggested_users dtype: string - name: item_weight dtype: string - name: item_dimensions__lxwxh dtype: string - name: department dtype: string - name: sport_type dtype: string splits: - name: train num_bytes: 1293209275 num_examples: 535206 download_size: 607190296 dataset_size: 1293209275 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for Dataset Name Original dataset can be found on: https://amazon-reviews-2023.github.io/ ## Dataset Details This dataset is downloaded from the link above, the category Sports and Outdoors meta dataset. ### Dataset Description This dataset is a refined version of the Amazon Sports and Outdoors 2023 meta dataset, which originally contained product metadata for sports and outdoors products that are sold on Amazon. The dataset includes detailed information about products such as their descriptions, ratings, prices, images, and features. The primary focus of this modification was to ensure the completeness of key fields while simplifying the dataset by removing irrelevant or empty columns. The table below represents the original structure of the dataset. <table border="1" cellpadding="5" cellspacing="0"> <tr> <th>Field</th> <th>Type</th> <th>Explanation</th> </tr> <tr> <td>main_category</td> <td>str</td> <td>Main category (i.e., domain) of the product.</td> </tr> <tr> <td>title</td> <td>str</td> <td>Name of the product.</td> </tr> <tr> <td>average_rating</td> <td>float</td> <td>Rating of the product shown on the product page.</td> </tr> <tr> <td>rating_number</td> <td>int</td> <td>Number of ratings in the product.</td> </tr> <tr> <td>features</td> <td>list</td> <td>Bullet-point format features of the product.</td> </tr> <tr> <td>description</td> <td>list</td> <td>Description of the product.</td> </tr> <tr> <td>price</td> <td>float</td> <td>Price in US dollars (at time of crawling).</td> </tr> <tr> <td>images</td> <td>list</td> <td>Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image.</td> </tr> <tr> <td>videos</td> <td>list</td> <td>Videos of the product including title and url.</td> </tr> <tr> <td>store</td> <td>str</td> <td>Store name of the product.</td> </tr> <tr> <td>categories</td> <td>list</td> <td>Hierarchical categories of the product.</td> </tr> <tr> <td>details</td> <td>dict</td> <td>Product details, including materials, brand, sizes, etc.</td> </tr> <tr> <td>parent_asin</td> <td>str</td> <td>Parent ID of the product.</td> </tr> <tr> <td>bought_together</td> <td>list</td> <td>Recommended bundles from the websites.</td> </tr> </table> ### Modifications made <ul> <li>Products without a description, title, images or details were removed.</li> <li>Lists in features and description are transformed into strings concatinated with a newline</li> <li>For the details column, only the top 16 most frequent detail types were kept. The details column was then split into these new 16 columns based on the detail types kept.</li> <li>Products with date first available before the year 2015 are dropped.</li> <li>Products with is_discontinued_by_manufacturer set to 'true' or 'yes' are dropped. Then that column was dropped.</li> <li>Column bought_together is dropped due to missing values.</li> </ul> ### Dataset Size <ul> <li>Total entries: 535,206</li> <li>Total columns: 29</li> </ul> ### Final Structure <table border="1" cellpadding="5" cellspacing="0"> <tr> <th>Field</th> <th>Type</th> <th>Explanation</th> </tr> <tr> <td>main_category</td> <td>str</td> <td>Main category</td> </tr> <tr> <td>title</td> <td>str</td> <td>Name of the product</td> </tr> <tr> <td>average_rating</td> <td>float</td> <td>Rating of the product shown on the product page.</td> </tr> <tr> <td>rating_number</td> <td>int</td> <td>Number of ratings in the product.</td> </tr> <tr> <td>features</td> <td>list</td> <td>Bullet-point format features of the product.</td> </tr> <tr> <td>description</td> <td>list</td> <td>Description of the product.</td> </tr> <tr> <td>price</td> <td>float</td> <td>Price in US dollars (at time of crawling).</td> </tr> <tr> <td>images</td> <td>list</td> <td>Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image.</td> </tr> <tr> <td>videos</td> <td>list</td> <td>Videos of the product including title and url.</td> </tr> <tr> <td>store</td> <td>str</td> <td>Store name of the product.</td> </tr> <tr> <td>details</td> <td>dict</td> <td>Product details, including materials, brand, sizes, etc.</td> </tr> <tr> <td>parent_asin</td> <td>str</td> <td>Parent ID of the product.</td> </tr> <tr> <td>date_first_available</td> <td>int64</td> <td>Date first time the product was available</td> </tr> <tr> <td>manufacturer</td> <td>str</td> <td>Manufacturer of the product</td> </tr> <tr> <td>brand_name</td> <td>str</td> <td>Brand name</td> </tr> <tr> <td>color</td> <td>str</td> <td>color</td> </tr> <tr> <td>package_weight</td> <td>str</td> <td>Package weight</td> </tr> <tr> <td>item_package_dimensions_l_x_w_x_h</td> <td>str</td> <td>Dimensions of the package item LxWxH</td> </tr> <tr> <td>part_number</td> <td>str</td> <td>Part number</td> </tr> <tr> <td>material</td> <td>str</td> <td>Material</td> </tr> <tr> <td>best_sellers_rank</td> <td>str</td> <td>Best seller rank</td> </tr> <tr> <td>size</td> <td>str</td> <td>Size</td> </tr> <tr> <td>style</td> <td>str</td> <td>Style</td> </tr> <tr> <td>brand</td> <td>str</td> <td>Brand</td> </tr> <tr> <td>suggested_users</td> <td>str</td> <td>Suggested users</td> </tr> <tr> <td>item_weight</td> <td>str</td> <td>Weight of the item</td> </tr> <tr> <td>item_dimensions__lxwxh</td> <td>str</td> <td>Item dimensions LxWxH</td> </tr> <tr> <td>department</td> <td>str</td> <td>Department</td> </tr> <tr> <td>sport_type</td> <td>str</td> <td>Sport type</td> </tr> </table>
提供机构:
Peter14352
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作