crawlfeeds/HomeDepot-Smart-Home-Dataset
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/crawlfeeds/HomeDepot-Smart-Home-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-nc-4.0
task_categories:
- text-classification
- text-generation
- feature-extraction
- zero-shot-classification
pretty_name: Home Depot Smart Home Product Dataset
size_categories:
- n<1K
tags:
- home-depot
- smart-home
- iot
- ecommerce
- product-data
- llm-training
- ai-training
- nlp
- fine-tuning
- recommendation-systems
- price-monitoring
- retail-intelligence
- product-classification
- specifications
configs:
- config_name: default
data_files:
- split: train
path: crawlfeeds_homedepot__limit-2000000_category_1-smart-home_20260409_191051.csv
---
# Home Depot Smart Home Product Dataset
A structured dataset of Smart Home products from Home Depot, featuring detailed product specifications, pricing, category taxonomy, highlights, color variants, dimensions, and brand data. Ideal for training product recommendation models, smart home AI assistants, price intelligence systems, and e-commerce search engines.
---
## Dataset Overview
| Field | Details |
|-------|---------|
| **Source** | Home Depot |
| **Total Records** | 230+ |
| **Category Focus** | Smart Home, IoT, Connected Devices |
| **Language** | English |
| **Formats** | CSV |
| **Data Quality** | Validated and structured |
| **Provider** | [Crawl Feeds](https://crawlfeeds.com) |
---
## What Makes This Dataset Valuable
- **Smart Home category focus** — one of the fastest-growing product categories in retail, covering connected devices, automation, and IoT products in high demand for AI training
- **Full specifications field** — rich structured technical specs (100% coverage) ideal for product compatibility modeling, voice assistant training, and smart home recommendation engines
- **Highlights field** — curated product selling points (100% coverage) perfect for marketing copy generation and product summarization models
- **Real pricing data** — both current price and original price fields enable price intelligence, discount modeling, and dynamic pricing AI
- **4-level category hierarchy** — granular taxonomy from category_1 through category_4 enables precise product classification across smart home subcategories
- **Brand + model number** — 100% coverage on brand and model fields, valuable for product matching, deduplication, and compatibility graph building
- **UPC + GTIN13** — standard retail identifiers enable cross-platform product matching and catalog enrichment
- **Multi-dimensional attributes** — color, weight, height, width, depth fields support physical product modeling and logistics AI
---
## Data Fields
| Field | Type | Coverage | Description |
|-------|------|----------|-------------|
| product_url | String | 100% | Direct product page URL |
| product_name | String | 100% | Full product name/title |
| gtin13 | String | 91.2% | Global Trade Item Number (GTIN-13) |
| description | String | 100% | Full product description |
| brand | String | 100% | Product brand name |
| model_number | String | 100% | Manufacturer model number |
| store_id | String | 100% | Home Depot store identifier |
| price | String | 97.5% | Current selling price |
| original_price | String | 10% | Original price before discount |
| discount | String | 10% | Discount amount or percentage |
| upc | String | 100% | Universal Product Code |
| item_id | String | 100% | Home Depot item identifier |
| store_sku | String | 98.3% | Store-level SKU |
| images | Array | — | Product image URLs array |
| depth | String | 35.4% | Product depth dimension |
| height | String | 59.3% | Product height dimension |
| width | String | 69.5% | Product width dimension |
| color | String | 90.5% | Product color variant |
| weight | String | 38.4% | Product weight |
| currency | String | 100% | Currency code |
| category_1 | String | 100% | Top-level category |
| category_2 | String | 100% | Second-level category |
| category_3 | String | 100% | Third-level category |
| category_4 | String | 74.1% | Fourth-level category |
| breadcrumbs | String | 100% | Full navigation breadcrumb path |
| unique_id | String | 100% | Unique record identifier |
| scraped_at | DateTime | — | Data collection timestamp |
| site_name | String | 100% | Source site name |
| country | String | 100% | Country of the store |
| specifications | String | 100% | Full technical specifications |
| highlights | String | 100% | Key product highlights/selling points |
| product_type | String | 100% | Product type classification |
| average_rating | String | 56.5% | Average customer rating |
| total_reviews | String | 63.3% | Total number of customer reviews |
---
## Use Cases
### Smart Home & IoT AI
- **Product compatibility modeling** — use specifications and model numbers to build device compatibility graphs
- **Smart home recommendation engines** — train models to suggest complementary connected devices
- **Voice assistant product knowledge** — enrich Alexa, Google Home, or custom assistant product databases
- **IoT device classification** — categorize smart home devices by protocol, connectivity, and function
### Natural Language Processing
- **Product description generation** — fine-tune LLMs on Home Depot's product copy style
- **Highlights summarization** — train abstractive summarization models on the highlights field
- **Specification parsing** — extract structured attributes from raw specification text
- **Semantic product search** — build dense retrieval systems using descriptions and specifications
### Price Intelligence & Retail Analytics
- **Price monitoring** — track smart home product pricing across time
- **Discount pattern analysis** — use original_price and discount fields to model promotional behavior
- **Competitive pricing models** — compare Home Depot smart home pricing vs competitors
- **Demand forecasting** — combine ratings and review counts with pricing for demand signals
### E-commerce & Product Data
- **Catalog enrichment** — use GTIN13 and UPC to match and enrich existing product catalogs
- **Product deduplication** — use model_number and brand for cross-platform matching
- **Taxonomy mapping** — map other retailer catalogs to Home Depot's smart home category structure
- **Image dataset building** — use image URL arrays for visual smart home product models
---
## Sample Data
```json
{
"product_name": "Smart Wi-Fi Plug Mini, Works with Alexa and Google Home",
"brand": "TP-Link",
"model_number": "EP25P4",
"category_1": "Smart Home",
"category_2": "Smart Plugs & Outlets",
"category_3": "Smart Plugs",
"price": "29.99",
"original_price": "39.99",
"discount": "25%",
"color": "White",
"specifications": "Voltage: 120V, Max Load: 15A, Wireless: Wi-Fi 2.4GHz...",
"highlights": "Voice control compatible, Schedule and timer support, Energy monitoring",
"average_rating": "4.6",
"total_reviews": "1842",
"currency": "USD",
"country": "US"
}
```
---
## Loading the Dataset
```python
from datasets import load_dataset
dataset = load_dataset("crawlfeeds/HomeDepot-Smart-Home-Dataset")
df = dataset["train"].to_pandas()
# Filter by brand
tp_link = df[df["brand"] == "TP-Link"]
# Get all products with ratings
rated_products = df[df["average_rating"].notna()]
# Text fields for LLM training
text_data = df[["product_name", "description", "highlights", "specifications"]].dropna()
# Products with pricing data
priced = df[df["price"].notna()]
```
---
## Data Quality Notes
- **Images field (0%)** — image URLs are available in the full dataset via crawlfeeds.com
- **Depth/weight dimensions** — partial coverage reflects Smart Home product category nature (many are small electronics where dimensions are less standardized)
- **Discount/original_price (10%)** — reflects real-world promotional availability at time of scraping
- **scraped_at** — timestamp available in full commercial dataset
---
## Full Dataset & Custom Data
This is a **sample dataset**. The full CrawlFeeds Home Depot Smart Home dataset contains:
- Thousands of records across all Smart Home subcategories
- Complete image URL arrays
- Full scraped_at timestamps
- Weekly and monthly refresh options
- Expandable to all Home Depot categories (3.4M+ records available)
- Custom subsets by brand, category, price range, or rating
**Get the full dataset:** [crawlfeeds.com](https://crawlfeeds.com)
**Request custom Home Depot data:** [crawlfeeds.com/contact](https://crawlfeeds.com/contact)
---
## Related Datasets by Crawl Feeds
- [IKEA Home Decor & Furniture Dataset](https://huggingface.co/datasets/crawlfeeds/IKEA-Home-Decor-Furniture-Dataset)
- [Trustpilot Reviews Dataset – 20K Sample](https://huggingface.co/datasets/crawlfeeds/Trustpilot-Reviews-Dataset-20K-Sample)
- [Walmart Reviews Dataset](https://huggingface.co/datasets/crawlfeeds/walmart-reviews-dataset)
- [Medium Articles Corpus](https://huggingface.co/datasets/crawlfeeds/Medium-Articles-Corpus)
- [Fox News Headlines Dataset](https://huggingface.co/datasets/crawlfeeds/Curated-Fox-News-Headlines-and-Full-Text)
- Airbnb Reviews Dataset *(coming soon)*
- Google Playstore Reviews Dataset *(coming soon)*
---
## License
This dataset is made available under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Intended for research and non-commercial use. For commercial licensing, please contact [crawlfeeds.com/contact](https://crawlfeeds.com/contact).
---
## Citation
If you use this dataset in your research or project, please cite:
```bibtex
@dataset{crawlfeeds_homedepot_smarthome_2025,
author = {Crawl Feeds},
title = {Home Depot Smart Home Product Dataset},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/crawlfeeds/HomeDepot-Smart-Home-Dataset}
}
```
---
*Data collected and maintained by [Crawl Feeds](https://crawlfeeds.com) — structured web data for AI, analytics, and business intelligence.*
提供机构:
crawlfeeds



