Name: smartcat/Amazon_All_Beauty_2023
Creator: smartcat
Published: 2024-10-29 12:18:02
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/smartcat/Amazon_All_Beauty_2023

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: title dtype: string - name: average_rating dtype: float64 - name: rating_number dtype: int64 - name: features dtype: string - name: description dtype: string - name: price dtype: float64 - name: images list: - name: thumb dtype: string - name: large dtype: string - name: variant dtype: string - name: hi_res dtype: string - name: videos list: - name: title dtype: string - name: url dtype: string - name: user_id dtype: string - name: store dtype: string - name: categories sequence: 'null' - name: parent_asin dtype: string - name: upc dtype: string - name: brand dtype: string - name: manufacturer dtype: string - name: product_dimensions dtype: string - name: color dtype: string - name: material dtype: string - name: is_discontinued_by_manufacturer dtype: string - name: item_form dtype: string - name: item_model_number dtype: string - name: age_range_(description) dtype: string - name: skin_type dtype: string - name: scent dtype: string - name: package_dimensions dtype: string - name: hair_type dtype: string - name: unit_count dtype: string - name: number_of_items dtype: string splits: - name: train num_bytes: 35604373 num_examples: 18956 download_size: 17113574 dataset_size: 35604373 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for Dataset Name Original dataset can be found on: https://amazon-reviews-2023.github.io/ ## Dataset Details This dataset is downloaded from the link above, the category Amazon All Beauty meta dataset. ### Dataset Description This dataset is a refined version of the Amazon All Beauty 2023 meta dataset, which originally contained product metadata for beauty products sold on Amazon. The dataset includes detailed information about products such as their descriptions, ratings, prices, images, and features. The primary focus of this modification was to ensure the completeness of key fields while simplifying the dataset by removing irrelevant or empty columns. The table below represents the original structure of the dataset. <table border="1" cellpadding="5" cellspacing="0"> <tr> <th>Field</th> <th>Type</th> <th>Explanation</th> </tr> <tr> <td>main_category</td> <td>str</td> <td>Main category (i.e., domain) of the product.</td> </tr> <tr> <td>title</td> <td>str</td> <td>Name of the product.</td> </tr> <tr> <td>average_rating</td> <td>float</td> <td>Rating of the product shown on the product page.</td> </tr> <tr> <td>rating_number</td> <td>int</td> <td>Number of ratings in the product.</td> </tr> <tr> <td>features</td> <td>list</td> <td>Bullet-point format features of the product.</td> </tr> <tr> <td>description</td> <td>list</td> <td>Description of the product.</td> </tr> <tr> <td>price</td> <td>float</td> <td>Price in US dollars (at time of crawling).</td> </tr> <tr> <td>images</td> <td>list</td> <td>Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image.</td> </tr> <tr> <td>videos</td> <td>list</td> <td>Videos of the product including title and url.</td> </tr> <tr> <td>store</td> <td>str</td> <td>Store name of the product.</td> </tr> <tr> <td>categories</td> <td>list</td> <td>Hierarchical categories of the product.</td> </tr> <tr> <td>details</td> <td>dict</td> <td>Product details, including materials, brand, sizes, etc.</td> </tr> <tr> <td>parent_asin</td> <td>str</td> <td>Parent ID of the product.</td> </tr> <tr> <td>bought_together</td> <td>list</td> <td>Recommended bundles from the websites.</td> </tr> </table> ### Modifications Made <ul> <li>Products without a description, title, or images were removed.</li> <li>The column main_category was dropped due to lack of informative content. The column bought_together was entirely empty, so it was also removed.</li> <li>For the details column, only the top 16 most frequent detail types were kept. Products with empty details fields were excluded from the final dataset. The details column was then split into these new 16 columns based on the detail types kept.</li> </ul> ### Dataset Size <ul> <li>Total entries: 18,956</li> <li>Total columns: 27</li> </ul> ### Final structure <table border="1" cellpadding="5" cellspacing="0"> <tr> <th>Field</th> <th>Type</th> <th>Explanation</th> </tr> <tr> <td>title</td> <td>str</td> <td>Name of the product.</td> </tr> <tr> <td>average_rating</td> <td>float</td> <td>Rating of the product shown on the product page.</td> </tr> <tr> <td>rating_number</td> <td>int</td> <td>Number of ratings in the product.</td> </tr> <tr> <td>features</td> <td>list</td> <td>Bullet-point format features of the product.</td> </tr> <tr> <td>description</td> <td>list</td> <td>Description of the product.</td> </tr> <tr> <td>price</td> <td>float</td> <td>Price in US dollars (at time of crawling).</td> </tr> <tr> <td>images</td> <td>list</td> <td>Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image.</td> </tr> <tr> <td>videos</td> <td>list</td> <td>Videos of the product including title and url.</td> </tr> <tr> <td>store</td> <td>str</td> <td>Store name of the product.</td> </tr> <tr> <td>details</td> <td>dict</td> <td>Product details, including materials, brand, sizes, etc.</td> </tr> <tr> <td>parent_asin</td> <td>str</td> <td>Parent ID of the product.</td> </tr> <tr> <td>upc</td> <td>str</td> <td>Universal Product Code (UPC), a barcode for uniquely identifying the product.</td> </tr> <tr> <td>brand</td> <td>str</td> <td>Brand name associated with the product.</td> </tr> <tr> <td>manufacturer</td> <td>str</td> <td>Name of the company or manufacturer responsible for producing the product.</td> </tr> <tr> <td>product_dimension</td> <td>str</td> <td>Dimensions of the product, typically including length, width, and height.</td> </tr> <tr> <td>color</td> <td>str</td> <td>Primary color or color variants of the product.</td> </tr> <tr> <td>material</td> <td>str</td> <td>Main materials used in the product’s construction.</td> </tr> <tr> <td>is_discontinued_by_manufacturer</td> <td>str</td> <td>Indicates whether the product has been discontinued by the manufacturer.</td> </tr> <tr> <td>item_form</td> <td>str</td> <td>Form of the item (e.g., liquid, solid, gel).</td> </tr> <tr> <td>item_model_number</td> <td>str</td> <td>Model number of the product as assigned by the manufacturer.</td> </tr> <tr> <td>age_range_(description)</td> <td>str</td> <td>Recommended age range for the product, often used for toys or children’s products.</td> </tr> <tr> <td>skin_type</td> <td>str</td> <td>Suitable skin types for the product (e.g., oily, dry, sensitive).</td> </tr> <tr> <td>scent</td> <td>str</td> <td>Fragrance or scent associated with the product.</td> </tr> <tr> <td>package_dimensions</td> <td>str</td> <td>Dimensions of the product’s packaging.</td> </tr> <tr> <td>hair_type</td> <td>str</td> <td>Suitable hair types for the product (e.g., curly, straight, fine).</td> </tr> <tr> <td>unit_count</td> <td>str</td> <td>Total quantity or units contained in the product package.</td> </tr> <tr> <td>number_of_items</td> <td>str</td> <td>Total number of individual items in the product package.</td> </tr> </table>

应用场景：