five

smartcat/Health_and_Personal_Care_2023

收藏
Hugging Face2024-10-31 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/smartcat/Health_and_Personal_Care_2023
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: main_category dtype: string - name: title dtype: string - name: average_rating dtype: float64 - name: rating_number dtype: int64 - name: features dtype: string - name: description dtype: string - name: price dtype: float64 - name: images list: - name: thumb dtype: string - name: large dtype: string - name: variant dtype: string - name: hi_res dtype: string - name: videos list: - name: title dtype: string - name: url dtype: string - name: user_id dtype: string - name: store dtype: string - name: categories sequence: 'null' - name: parent_asin dtype: string - name: date_first_available dtype: int64 - name: manufacturer dtype: string - name: brand dtype: string - name: package_dimensions dtype: string - name: item_model_number dtype: string - name: unit_count dtype: string - name: item_form dtype: string - name: age_range_(description) dtype: string - name: item_weight dtype: string - name: number_of_items dtype: string - name: material dtype: string - name: department dtype: string - name: product_dimensions dtype: string - name: color dtype: string - name: flavor dtype: string - name: item_dimensions_lxwxh dtype: string splits: - name: train num_bytes: 21064468 num_examples: 10730 download_size: 9655052 dataset_size: 21064468 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for Dataset Name Original dataset can be found on: https://amazon-reviews-2023.github.io/ ## Dataset Details This dataset is downloaded from the link above, the category Health and Personal Care meta dataset. ### Dataset Description This dataset is a refined version of the Amazon Health and Personal Care 2023 meta dataset, which originally contained product metadata for products that are intended for health and personal care that are sold on Amazon. The dataset includes detailed information about products such as their descriptions, ratings, prices, images, and features. The primary focus of this modification was to ensure the completeness of key fields while simplifying the dataset by removing irrelevant or empty columns. The table below represents the original structure of the dataset. <table border="1" cellpadding="5" cellspacing="0"> <tr> <th>Field</th> <th>Type</th> <th>Explanation</th> </tr> <tr> <td>main_category</td> <td>str</td> <td>Main category (i.e., domain) of the product.</td> </tr> <tr> <td>title</td> <td>str</td> <td>Name of the product.</td> </tr> <tr> <td>average_rating</td> <td>float</td> <td>Rating of the product shown on the product page.</td> </tr> <tr> <td>rating_number</td> <td>int</td> <td>Number of ratings in the product.</td> </tr> <tr> <td>features</td> <td>list</td> <td>Bullet-point format features of the product.</td> </tr> <tr> <td>description</td> <td>list</td> <td>Description of the product.</td> </tr> <tr> <td>price</td> <td>float</td> <td>Price in US dollars (at time of crawling).</td> </tr> <tr> <td>images</td> <td>list</td> <td>Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image.</td> </tr> <tr> <td>videos</td> <td>list</td> <td>Videos of the product including title and url.</td> </tr> <tr> <td>store</td> <td>str</td> <td>Store name of the product.</td> </tr> <tr> <td>categories</td> <td>list</td> <td>Hierarchical categories of the product.</td> </tr> <tr> <td>details</td> <td>dict</td> <td>Product details, including materials, brand, sizes, etc.</td> </tr> <tr> <td>parent_asin</td> <td>str</td> <td>Parent ID of the product.</td> </tr> <tr> <td>bought_together</td> <td>list</td> <td>Recommended bundles from the websites.</td> </tr> </table> ### Modifications made <ul> <li>Products without a description, title, images or details were removed.</li> <li>Lists in features and description are transformed into strings concatinated with a newline</li> <li>For the details column, only the top 16 most frequent detail types were kept. The details column was then split into these new 16 columns based on the detail types kept.</li> <li>Products with date first available before the year 2015 are dropped.</li> <li>Products with is_discontinued_by_manufacturer set to 'true' or 'yes' are dropped. Then that column was dropped.</li> <li>Column bought_together is dropped due to missing values.</li> </ul> ### Dataset Size <ul> <li>Total entries: 10,730</li> <li>Total columns: 28</li> </ul> ### Final Structure <table border="1" cellpadding="5" cellspacing="0"> <tr> <th>Field</th> <th>Type</th> <th>Explanation</th> </tr> <tr> <td>main_category</td> <td>str</td> <td>Main category</td> </tr> <tr> <td>title</td> <td>str</td> <td>Name of the product</td> </tr> <tr> <td>average_rating</td> <td>float</td> <td>Rating of the product shown on the product page.</td> </tr> <tr> <td>rating_number</td> <td>int</td> <td>Number of ratings in the product.</td> </tr> <tr> <td>features</td> <td>list</td> <td>Bullet-point format features of the product.</td> </tr> <tr> <td>description</td> <td>list</td> <td>Description of the product.</td> </tr> <tr> <td>price</td> <td>float</td> <td>Price in US dollars (at time of crawling).</td> </tr> <tr> <td>images</td> <td>list</td> <td>Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image.</td> </tr> <tr> <td>videos</td> <td>list</td> <td>Videos of the product including title and url.</td> </tr> <tr> <td>store</td> <td>str</td> <td>Store name of the product.</td> </tr> <tr> <td>details</td> <td>dict</td> <td>Product details, including materials, brand, sizes, etc.</td> </tr> <tr> <td>parent_asin</td> <td>str</td> <td>Parent ID of the product.</td> </tr> <tr> <td>date_first_available</td> <td>int64</td> <td>Date first the product was available</td> </tr> <tr> <td>manufacturer</td> <td>str</td> <td>Manufacturer</td> </tr> <tr> <td>brand</td> <td>str</td> <td>Name of the brand</td> </tr> <tr> <td>package_dimensions</td> <td>str</td> <td>Package dimensions</td> </tr> <tr> <td>item_model_number</td> <td>str</td> <td>Item model number</td> </tr> <tr> <td>unit_count</td> <td>str</td> <td>Units of the product</td> </tr> <tr> <td>item_form</td> <td>str</td> <td>Item form</td> </tr> <tr> <td>age_range_(description)</td> <td>str</td> <td>Age range</td> </tr> <tr> <td>item_weight</td> <td>str</td> <td>Weight of the item</td> </tr> <tr> <td>number_of_items</td> <td>str</td> <td>Number of items</td> </tr> <tr> <td>material</td> <td>str</td> <td>Material</td> </tr> <tr> <td>department</td> <td>str</td> <td>Department</td> </tr> <tr> <td>product_dimensions</td> <td>str</td> <td>Product dimensions</td> </tr> <tr> <td>color</td> <td>str</td> <td>Color</td> </tr> <tr> <td>flavor</td> <td>str</td> <td>Flavor</td> </tr> <tr> <td>item_dimensions_lxwxh</td> <td>str</td> <td>Item dimensions LxWxH</td> </tr> </table>
提供机构:
smartcat
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作