five

microsoft/hnm-search-data

收藏
Hugging Face2026-02-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/microsoft/hnm-search-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: articles data_files: - split: train path: data/raw/articles.csv - config_name: customers data_files: - split: train path: data/raw/customers.csv - config_name: transactions data_files: - split: train path: data/raw/transactions_train.csv dataset_info: - config_name: articles features: - name: article_id dtype: int64 - name: product_code dtype: int64 - name: prod_name dtype: string - name: product_type_no dtype: int64 - name: product_type_name dtype: string - name: product_group_name dtype: string - name: graphical_appearance_no dtype: int64 - name: graphical_appearance_name dtype: string - name: colour_group_code dtype: int64 - name: colour_group_name dtype: string - name: perceived_colour_value_id dtype: int64 - name: perceived_colour_value_name dtype: string - name: perceived_colour_master_id dtype: int64 - name: perceived_colour_master_name dtype: string - name: department_no dtype: int64 - name: department_name dtype: string - name: index_code dtype: string - name: index_name dtype: string - name: index_group_no dtype: int64 - name: index_group_name dtype: string - name: section_no dtype: int64 - name: section_name dtype: string - name: garment_group_no dtype: int64 - name: garment_group_name dtype: string - name: detail_desc dtype: string - config_name: customers features: - name: customer_id dtype: string - name: FN dtype: float64 - name: Active dtype: float64 - name: club_member_status dtype: string - name: fashion_news_frequency dtype: string - name: age dtype: float64 - name: postal_code dtype: string - config_name: transactions features: - name: t_dat dtype: string - name: customer_id dtype: string - name: article_id dtype: int64 - name: price dtype: float64 - name: sales_channel_id dtype: int64 task_categories: - text-ranking - text-retrieval - text-classification language: - en pretty_name: 'H&M Search Queries and Personalized Results ' size_categories: - 10M<n<100M tags: - fashion - e-commerce - customer-behavior - tabular - recommendation-systems - search - ranking --- # HnM Search Dataset Created from Recommendations Dataset This synthetic data-set is created using the recommendations dataset: * https://huggingface.co/datasets/einrafh/hnm-fashion-recommendations-data (Use of this dataset is subject to the terms and conditions set forth on the original distribution page. This dataset is intended for non-commercial and research use.) * https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations/data (DATA ACCESS AND USE: Non-Commercial Purposes & Academic Research.) as base. The base dataset is a recommendations data set where transactions data has the articles purchased by the users. This dataset gives the search queries, which may have been issued by the user before buying the article, along with the candidate results. The license for our additions is https://cdla.dev/permissive-2-0/ ## Search Queries Dataset * **`queries.csv`**: `253685` List of queries for transactions. * **`qrels.csv`**: `253685` List of positive and negative article-ids which were retrieved for each query. ## Base Dataset * **`articles.csv`**: `105542` List of unique products/articles with their properties/features. * **`customers.csv`**: `1371980` List of unique customers/users with their properties/features. * **`transactions_train.csv`**: `31788324` List of historical transactions/purchases of different articles by customers. ## 📂 Dataset Structure & Components All search queries data is located in the folder 'data/search/' directory. * **`data/search/queries.csv`** Queries generated from individual transactions (transactions_train.csv). *(253685 rows, 3 columns: query_id, transaction_id, and query_text)* * **`data/search/qrels.csv`** Query results candidates-- positives (from the transaction) and close negatives article_ids (from articles.csv) . *(253685 rows, 3 columns: query_id, positive_ids, negatives_ids (space separated))* All raw (recommendations) data is located in the `data/raw/` directory. * **`data/raw/transactions_train.csv`** A historical record of all purchase transactions. This file serves as a central table connecting customers with the articles they purchased. *(31,788,324 rows, 5 columns)* * **`data/raw/customers.csv`** This dimension table contains attributes for each unique customer. *(1,371,980 rows, 7 columns)* * **`data/raw/articles.csv`** This dimension table contains highly detailed attributes for each unique product (article). *(105,542 rows, 25 columns)* * **`data/raw/images/`** This directory contains product images, organized into subdirectories based on the first 3 digits of the `article_id`. ## 🔗 Relationships Between Search Data These files can be combined (joined) to create a comprehensive dataset for analysis: query_id can be used to join the files queries.csv and qrels.csv to get the textual queries and the corresponding resultant articles. Similarly, transaction_id (from queries.csv) can be used to get the details of corresponding transactions using transactions_train.csv. positive_ids and negative_ids (from qrels.csv) can be used to join with articles.csv to get the details of the result articles (both positive-- which the user purchased-- and negatives) ## 📊Data Schema Data schema for `transactions_train.csv`, 'customers.csv', and 'articles.csv' can be obtained from https://huggingface.co/datasets/einrafh/hnm-fashion-recommendations-data. Here is the schema for the search data. ### `queries.csv` | column | Description | Type | |---|---|---| | `query_id` | Unique ID for the query(Primary Key) | `object` (String) | | `transaction_id` | Unique ID for the transaction(Foreign Key) | `object` (String) | | `query_text` | Text of the query | `object` (String) | ### `qrels.csv` | column | Description | Type | |---|---|---| | `query_id` | ID for the query(Foreign Key) | `object` (String) | | `positive_ids` | ID for the positive result(Foreign Key) which the user clicked/purchased | `object` (String) | | `negative_ids` | Space separated list of IDs for the negative result(Foreign Key) which the user didn't click/purchase | `object` (String) | ## 📌 Source The base dataset is provided to the public by H&M Group through the Kaggle platform for analysis and research purposes. We have added search queries over the base dataset. - **Platform**: Kaggle, [H&M Personalized Fashion Recommendations](https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations) ## ⚠️ License The use of this dataset is subject to the terms and conditions stated on its original distribution page. This dataset is intended for non-commercial and research purposes.
提供机构:
microsoft
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作