进口商品报关单规则挖掘数据集
收藏国家基础学科公共科学数据中心2025-08-30 收录
下载链接:
https://nbsdc.cn/general/dataDetail?id=68989a7e195d26317b036f19&type=1
下载链接
链接失效反馈官方服务:
资源简介:
进口商品报关单规则挖掘数据集来源于网络公开模拟数据,从代码网站github上导出的虚拟报关单数据,在对数据进行异常值剔除处理之后,共包含153000条数据样例,共包括两部分,第一部分共包含商品ID、进口时间等13个属性和查验结果的二分类标签,第二部分共包含商品ID、进口时间等20个属性和查验结果的二分类标签。数据集用于海关风险规则挖掘算法的研究,后续可继续用于报关单数据挖掘研究。该数据集为表格型数据,只包含两个数据文件,无层级结构,每条数据包含20个字段和二分类标签,分为离散类型字段和连续类型字段。数据集包含商品名称、商品编号、进出口日期等字段,时间精度、空间范围和空间精度各不相同。该数据集是用于海关风险规则生成技术的研究,使用时需要使用Pandas工具读取数据,并将数据划分为训练集、验证集与测试集,然后使用有监督学习框架例如sklearn进行模型训练。
The dataset for import customs declaration rule mining is derived from publicly available simulated online data, specifically virtual customs declaration data exported from the code hosting platform GitHub. After outlier removal processing, the dataset contains a total of 153,000 data samples, which is divided into two parts. The first part includes 13 attributes such as commodity ID and import time, along with a binary label for inspection results; the second part includes 20 attributes such as commodity ID and import time, alongside the same binary inspection result label. This dataset is intended for research on customs risk rule mining algorithms, and can also be further applied to customs declaration data mining research in the future. As a tabular dataset, it only contains two data files with no hierarchical structure. Each data sample includes 20 fields and a binary classification label, with the fields categorized into discrete and continuous types. The dataset contains fields such as commodity name, commodity code, import and export dates, etc., with varying temporal precision, spatial scope, and spatial precision. This dataset is designed for research on customs risk rule generation technologies. When utilizing it, users need to read the data using the Pandas library, split the dataset into training, validation, and test sets, and then train models via supervised learning frameworks such as scikit-learn (sklearn).
提供机构:
全国海关信息中心
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含153000条模拟报关单数据,分为两部分,分别具有13个和20个属性及二分类标签,用于海关风险规则挖掘算法研究。数据格式为csv和docx,需使用Pandas工具处理并进行有监督学习模型训练。
以上内容由遇见数据集搜集并总结生成



