DGraph-Fin
收藏DataCite Commons2025-04-27 更新2025-05-18 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=c0f36252829c48debe2cc95356f3041c
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is set against the background of intelligent risk control and includes a directed dynamic graph that is desensitized and has fraudulent users, reflecting the social network relationships of credit users. Sample data from different time periods of actual business, where node A is an internet credit user, and the directed edge from node A to node B represents user A recording user B as an emergency contact; The type of edge represents the classification of emergency contacts, and the edge attribute is temporal information (desensitized to a positive integer starting from 1, in days) to form a heterogeneous dynamic graph. The network dataset contains over 3.7 million nodes and is divided into the following three categories, with specific definitions as follows: (1) Fraudulent users: have experienced credit overdue and fraudulent behavior in their past lending behavior; Marked as positive samples, including node 15509, accounting for 0.42%; (2) Normal users: have never experienced credit overdue or fraudulent behavior in their past lending behavior; Marked as negative samples, including nodes 1210092, accounting for 32.7%; (3) Background node: There has been no past lending behavior, and unmarked is not used as a fraud detection target. It is only used to supplement the connectivity and neighborhood background information of social networks, including a total of 2474949 nodes, accounting for 66.9%; The nodes contain 17 dimensional desensitized attribute vectors, each corresponding to different elements of attribute information, and the missing values are supplemented with -1 to the corresponding attribute vectors. In consideration of privacy protection, only the category of the attribute is provided in the original data, as shown below: (1) User ID: The unique identification number corresponding to the user's credit account; (2) Basic personal information: This includes basic personal identity information such as the user's age and gender; (3) Communication method: Information related to contact information such as phone numbers; (4) Lending behavior: This includes information such as "repayment maturity date" and "actual repayment date" to describe user lending behavior; (5) Emergency contact information: This includes the basic information of the emergency contact provided during credit account registration, such as the name, contact information, and final update date of the emergency contact person; At the same time, the network dataset contains over 4.3 million edges, which are divided into 11 categories based on the categories of emergency contacts, It corresponds to the "emergency contact information" in the node attribute, so for privacy protection reasons, the original dataset is represented by 1 to 11, and the actual direction of the undisclosed category
本数据集以智能风控为背景,包含一张经过脱敏处理、涵盖欺诈用户的有向动态图,该图反映了信贷用户的社交网络关系。样本数据取自真实业务的不同时间周期,其中节点A代表互联网信贷用户,从节点A指向节点B的有向边表示用户A将用户B记录为紧急联系人;边的类型代表紧急联系人的分类,边属性为时间信息(已脱敏为从1开始的正整数,单位为天),最终构成异质动态图(heterogeneous dynamic graph)。
该网络数据集包含超370万个节点,分为以下三类,具体定义如下:
(1) 欺诈用户:过往借贷行为中曾出现信贷逾期及欺诈行为,标记为正样本,包含15509个节点,占比0.42%;
(2) 正常用户:过往借贷行为中从未出现信贷逾期或欺诈行为,标记为负样本,包含1210092个节点,占比32.7%;
(3) 背景节点:无过往借贷行为,未被标记,不作为欺诈检测目标,仅用于补充社交网络的连通性与邻域背景信息,总计2474949个节点,占比66.9%;
节点包含17维脱敏属性向量,各维度对应不同的属性信息元素,缺失值以-1填充至对应属性向量中。出于隐私保护考量,原始数据仅提供属性的类别信息,具体如下:
(1) 用户ID:对应用户信贷账户的唯一标识编号;
(2) 基础个人信息:包含用户年龄、性别等基础个人身份信息;
(3) 通信方式:与电话号码等联系信息相关的内容;
(4) 借贷行为:包含"还款到期日""实际还款日"等信息,用于描述用户的借贷行为;
(5) 紧急联系人信息:包含信贷账户注册时提供的紧急联系人的基础信息,如紧急联系人姓名、联系方式及最后更新日期;
与此同时,该网络数据集包含超430万条边,基于紧急联系人的类别分为11类,对应节点属性中的"紧急联系人信息"字段;出于隐私保护原因,原始数据集以1至11进行编码表示,其实际类别未对外披露。
提供机构:
Science Data Bank
创建时间:
2023-11-10
搜集汇总
数据集介绍

背景与挑战
背景概述
DGraph-Fin是一个用于信用风险控制的动态图数据集,包含脱敏的用户社交网络关系和借贷行为数据,具有节点分类明确、属性维度丰富和隐私保护严密的特点。
以上内容由遇见数据集搜集并总结生成



