Aurora-chasing/data_sample_1000

Name: Aurora-chasing/data_sample_1000
Creator: Aurora-chasing
Published: 2026-04-08 07:45:21
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Aurora-chasing/data_sample_1000

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 --- # TAAC2026 Demo Dataset (1000 Samples) A sample dataset containing 1000 user-item interaction records for the [TAAC2026 competition](https://algo.qq.com). ## Dataset Description - **Rows**: 1,000 - **Format**: Parquet (`sample_data.parquet`) - **File Size**: ~68 MB ## Columns | Column | Type | Description | |---|---|---| | `item_id` | `int64` | **Target item** identifier. | | `item_feature` | `array[struct]` | Array of **target item** feature dicts. Each element has `feature_id`, `feature_value_type`, and value fields (`float_value`, `int_array`, `int_value`). | | `label` | `array[struct]` | Array of label dicts. Each element contains `action_time` and `action_type`. | | `seq_feature` | `struct` | Sequence features dict with keys: `action_seq`, `content_seq`, `item_seq`. Each sub-key contains arrays of feature structs. | | `timestamp` | `int64` | Event timestamp. | | `user_feature` | `array[struct]` | Array of user feature dicts. Each element has `feature_id`, `feature_value_type`, and value fields (`float_array`, `int_array`, `int_value`). | | `user_id` | `string` | User identifier. | ## Feature Struct Schema Each feature element contains `feature_id`, `feature_value_type`, and several value fields. Depending on `feature_value_type`, the corresponding value fields are populated and the rest are `null`. **`item_feature`** — value fields: `int_value`, `float_value`, `int_array` ```json { "feature_id": 6, "feature_value_type": "int_value", "float_value": null, "int_array": null, "int_value": 96, } ``` **`user_feature`** — value fields: `int_value`, `float_array`, `int_array` ```json { "feature_id": 65, "feature_value_type": "int_value", "float_array": null, "int_array": null, "int_value": 19 } ``` **`seq_feature`** — value fields: `int_array` ```json { "feature_id": 19, "feature_value_type": "int_array", "int_array": [1, 1, 1, ...] } ``` Possible `"feature_value_type"` values and their corresponding fields: - `"int_value"` → `int_value` - `"float_value"` → `float_value` - `"int_array"` → `int_array` - `"float_array"` → `float_array` - Also there are some combinations of these types, e.g. `"int_array_and_float_array"` → both `int_array` and `float_array` are populated. ## Label Schema Each element in the `label` array: ```json { "action_time": 1770694299, "action_type": 1 } ``` ## Usage ```python import pandas as pd df = pd.read_parquet("sample_data.parquet") print(df.shape) # (1000, 7) print(df.columns) # ['item_id', 'item_feature', 'label', 'seq_feature', 'timestamp', 'user_feature', 'user_id'] ``` With Hugging Face `datasets`: ```python from datasets import load_dataset ds = load_dataset("TAAC2026/data_sample_1000") print(ds) ```

许可协议：知识共享署名-非商业性使用4.0国际许可协议（CC BY-NC 4.0） # TAAC2026演示数据集（1000条样本）本数据集为面向[TAAC2026竞赛](https://algo.qq.com)的示例数据集，共包含1000条用户-物品交互记录。 ## 数据集概览 - **数据行数**：1000 - **数据格式**：Parquet格式（文件名为`sample_data.parquet`） - **文件大小**：约68 MB ## 字段说明 | 字段名 | 数据类型 | 字段描述 | |---|---|---| | `item_id` | `int64` | **目标物品（target item）**唯一标识符。 | | `item_feature` | `array[struct]` | 目标物品（target item）特征字典数组。每个元素包含`feature_id`、`feature_value_type`以及取值字段（`float_value`、`int_array`、`int_value`）。 | | `label` | `array[struct]` | 标签字典数组。每个元素包含`action_time`与`action_type`字段。 | | `seq_feature` | `struct` | 序列特征字典，包含`action_seq`、`content_seq`、`item_seq`三个键，每个子键对应特征结构体数组。 | | `timestamp` | `int64` | 事件时间戳。 | | `user_feature` | `array[struct]` | 用户特征字典数组。每个元素包含`feature_id`、`feature_value_type`以及取值字段（`float_array`、`int_array`、`int_value`）。 | | `user_id` | `string` | 用户唯一标识符。 | ## 特征结构体规范每个特征元素均包含`feature_id`、`feature_value_type`以及若干取值字段。根据`feature_value_type`的取值，仅会填充对应的取值字段，其余字段均为`null`。 ### `item_feature` 字段对应取值字段为：`int_value`、`float_value`、`int_array` 示例格式： json { "feature_id": 6, "feature_value_type": "int_value", "float_value": null, "int_array": null, "int_value": 96, } ### `user_feature` 字段对应取值字段为：`int_value`、`float_array`、`int_array` 示例格式： json { "feature_id": 65, "feature_value_type": "int_value", "float_array": null, "int_array": null, "int_value": 19 } ### `seq_feature` 字段对应取值字段为：`int_array` 示例格式： json { "feature_id": 19, "feature_value_type": "int_array", "int_array": [1, 1, 1, ...] } 可选的`"feature_value_type"`取值及其对应字段如下： - `"int_value"` → 对应`int_value`字段 - `"float_value"` → 对应`float_value`字段 - `"int_array"` → 对应`int_array`字段 - `"float_array"` → 对应`float_array`字段 - 此外存在部分组合类型，例如`"int_array_and_float_array"`，此时`int_array`与`float_array`字段均会被填充。 ## 标签格式规范 `label`数组中的每个元素格式如下： json { "action_time": 1770694299, "action_type": 1 } ## 使用示例 ### 通过Pandas加载 python import pandas as pd df = pd.read_parquet("sample_data.parquet") print(df.shape) # (1000, 7) print(df.columns) # ['item_id', 'item_feature', 'label', 'seq_feature', 'timestamp', 'user_feature', 'user_id'] ### 通过Hugging Face `datasets`库加载 python from datasets import load_dataset ds = load_dataset("TAAC2026/data_sample_1000") print(ds)

提供机构：

Aurora-chasing

5,000+

优质数据集

54 个

任务类型

进入经典数据集