smartcat/Health_and_Personal_Care_2023
收藏Hugging Face2024-10-31 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/smartcat/Health_and_Personal_Care_2023
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: main_category
dtype: string
- name: title
dtype: string
- name: average_rating
dtype: float64
- name: rating_number
dtype: int64
- name: features
dtype: string
- name: description
dtype: string
- name: price
dtype: float64
- name: images
list:
- name: thumb
dtype: string
- name: large
dtype: string
- name: variant
dtype: string
- name: hi_res
dtype: string
- name: videos
list:
- name: title
dtype: string
- name: url
dtype: string
- name: user_id
dtype: string
- name: store
dtype: string
- name: categories
sequence: 'null'
- name: parent_asin
dtype: string
- name: date_first_available
dtype: int64
- name: manufacturer
dtype: string
- name: brand
dtype: string
- name: package_dimensions
dtype: string
- name: item_model_number
dtype: string
- name: unit_count
dtype: string
- name: item_form
dtype: string
- name: age_range_(description)
dtype: string
- name: item_weight
dtype: string
- name: number_of_items
dtype: string
- name: material
dtype: string
- name: department
dtype: string
- name: product_dimensions
dtype: string
- name: color
dtype: string
- name: flavor
dtype: string
- name: item_dimensions_lxwxh
dtype: string
splits:
- name: train
num_bytes: 21064468
num_examples: 10730
download_size: 9655052
dataset_size: 21064468
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for Dataset Name
Original dataset can be found on: https://amazon-reviews-2023.github.io/
## Dataset Details
This dataset is downloaded from the link above, the category Health and Personal Care meta dataset.
### Dataset Description
This dataset is a refined version of the Amazon Health and Personal Care 2023 meta dataset, which originally contained product metadata for products that are intended for health and personal care that are sold on Amazon. The dataset includes detailed information about products such as their descriptions, ratings, prices, images, and features. The primary focus of this modification was to ensure the completeness of key fields while simplifying the dataset by removing irrelevant or empty columns.
The table below represents the original structure of the dataset.
<table border="1" cellpadding="5" cellspacing="0">
<tr>
<th>Field</th>
<th>Type</th>
<th>Explanation</th>
</tr>
<tr>
<td>main_category</td>
<td>str</td>
<td>Main category (i.e., domain) of the product.</td>
</tr>
<tr>
<td>title</td>
<td>str</td>
<td>Name of the product.</td>
</tr>
<tr>
<td>average_rating</td>
<td>float</td>
<td>Rating of the product shown on the product page.</td>
</tr>
<tr>
<td>rating_number</td>
<td>int</td>
<td>Number of ratings in the product.</td>
</tr>
<tr>
<td>features</td>
<td>list</td>
<td>Bullet-point format features of the product.</td>
</tr>
<tr>
<td>description</td>
<td>list</td>
<td>Description of the product.</td>
</tr>
<tr>
<td>price</td>
<td>float</td>
<td>Price in US dollars (at time of crawling).</td>
</tr>
<tr>
<td>images</td>
<td>list</td>
<td>Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image.</td>
</tr>
<tr>
<td>videos</td>
<td>list</td>
<td>Videos of the product including title and url.</td>
</tr>
<tr>
<td>store</td>
<td>str</td>
<td>Store name of the product.</td>
</tr>
<tr>
<td>categories</td>
<td>list</td>
<td>Hierarchical categories of the product.</td>
</tr>
<tr>
<td>details</td>
<td>dict</td>
<td>Product details, including materials, brand, sizes, etc.</td>
</tr>
<tr>
<td>parent_asin</td>
<td>str</td>
<td>Parent ID of the product.</td>
</tr>
<tr>
<td>bought_together</td>
<td>list</td>
<td>Recommended bundles from the websites.</td>
</tr>
</table>
### Modifications made
<ul>
<li>Products without a description, title, images or details were removed.</li>
<li>Lists in features and description are transformed into strings concatinated with a newline</li>
<li>For the details column, only the top 16 most frequent detail types were kept. The details column was then split into these new 16 columns based on the detail types kept.</li>
<li>Products with date first available before the year 2015 are dropped.</li>
<li>Products with is_discontinued_by_manufacturer set to 'true' or 'yes' are dropped. Then that column was dropped.</li>
<li>Column bought_together is dropped due to missing values.</li>
</ul>
### Dataset Size
<ul>
<li>Total entries: 10,730</li>
<li>Total columns: 28</li>
</ul>
### Final Structure
<table border="1" cellpadding="5" cellspacing="0">
<tr>
<th>Field</th>
<th>Type</th>
<th>Explanation</th>
</tr>
<tr>
<td>main_category</td>
<td>str</td>
<td>Main category</td>
</tr>
<tr>
<td>title</td>
<td>str</td>
<td>Name of the product</td>
</tr>
<tr>
<td>average_rating</td>
<td>float</td>
<td>Rating of the product shown on the product page.</td>
</tr>
<tr>
<td>rating_number</td>
<td>int</td>
<td>Number of ratings in the product.</td>
</tr>
<tr>
<td>features</td>
<td>list</td>
<td>Bullet-point format features of the product.</td>
</tr>
<tr>
<td>description</td>
<td>list</td>
<td>Description of the product.</td>
</tr>
<tr>
<td>price</td>
<td>float</td>
<td>Price in US dollars (at time of crawling).</td>
</tr>
<tr>
<td>images</td>
<td>list</td>
<td>Images of the product. Each image has different sizes (thumb, large, hi_res). The “variant” field shows the position of image.</td>
</tr>
<tr>
<td>videos</td>
<td>list</td>
<td>Videos of the product including title and url.</td>
</tr>
<tr>
<td>store</td>
<td>str</td>
<td>Store name of the product.</td>
</tr>
<tr>
<td>details</td>
<td>dict</td>
<td>Product details, including materials, brand, sizes, etc.</td>
</tr>
<tr>
<td>parent_asin</td>
<td>str</td>
<td>Parent ID of the product.</td>
</tr>
<tr>
<td>date_first_available</td>
<td>int64</td>
<td>Date first the product was available</td>
</tr>
<tr>
<td>manufacturer</td>
<td>str</td>
<td>Manufacturer</td>
</tr>
<tr>
<td>brand</td>
<td>str</td>
<td>Name of the brand</td>
</tr>
<tr>
<td>package_dimensions</td>
<td>str</td>
<td>Package dimensions</td>
</tr>
<tr>
<td>item_model_number</td>
<td>str</td>
<td>Item model number</td>
</tr>
<tr>
<td>unit_count</td>
<td>str</td>
<td>Units of the product</td>
</tr>
<tr>
<td>item_form</td>
<td>str</td>
<td>Item form</td>
</tr>
<tr>
<td>age_range_(description)</td>
<td>str</td>
<td>Age range</td>
</tr>
<tr>
<td>item_weight</td>
<td>str</td>
<td>Weight of the item</td>
</tr>
<tr>
<td>number_of_items</td>
<td>str</td>
<td>Number of items</td>
</tr>
<tr>
<td>material</td>
<td>str</td>
<td>Material</td>
</tr>
<tr>
<td>department</td>
<td>str</td>
<td>Department</td>
</tr>
<tr>
<td>product_dimensions</td>
<td>str</td>
<td>Product dimensions</td>
</tr>
<tr>
<td>color</td>
<td>str</td>
<td>Color</td>
</tr>
<tr>
<td>flavor</td>
<td>str</td>
<td>Flavor</td>
</tr>
<tr>
<td>item_dimensions_lxwxh</td>
<td>str</td>
<td>Item dimensions LxWxH</td>
</tr>
</table>
提供机构:
smartcat



