Fraud Detection with Synthetic Data
收藏Snowflake2023-04-05 更新2024-05-01 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZSYZKLF6O
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a synthetic dataset designed to simulate mortgage applications in a banking context, with the aim of identifying potentially fraudulent instances. Each row in the dataset represents a unique mortgage application, including various applicant details, property information, and financial metrics. The dataset has been engineered to include a mix of legitimate and fraudulent applications, providing a comprehensive base for developing and evaluating fraud detection models. Different variations of the dataset can be generated depending on specific needs, feel free to contact us for more information.
This synthetic dataset can be utilized for various purposes, including:
- Developing machine learning models for detecting mortgage fraud in a banking context.
- Evaluating the performance of different fraud detection algorithms.
- Analyzing and understanding the patterns and characteristics of fraudulent mortgage applications.
- Training and validating models that can generalize well to real-world mortgage application data.
The synthetic dataset was generated using an agent-based method calibrated on a real-life scenario and augmented using generative models. Agents are designed to represent individuals from specific geographical areas. A small number of agents are designed to assume fraudulent behavior by providing specific patterns and rules (provided with the final documentation) that will characterize the final dataset.
Tables Included:
The dataset includes two tables:
- Customers, a table describing the anagraphic of the applicants
- Applications, a table describing each mortgage request
Fields Included:
Customers Table:
- Applicant ID (integer): A unique identifier for each applicant.
- Applicant Name (string): First name of the applicant.
- Applicant Last name (string): Last name of the applicant.
- Date of Birth (date): The applicant's date of birth.
- City of Birth (string): The applicant's city of birth.
- Country of Birth (string): The applicant's country of birth.
- Employment Status (string): The applicant's current employment status (e.g., Employed, Self-employed, Unemployed, Retired).
- Annual Income (float): The applicant's total annual income in USD.
- Credit Score (integer): The applicant's credit score at the time of application.
-Previous Foreclosures (integer): The number of previous foreclosures, if any, on the applicant's record.
- Bankruptcies (integer): The number of bankruptcies, if any, on the applicant's record.
Applications Table:
- Application ID (integer): A unique identifier for each mortgage application.
- Applicant ID (integer)(foreign key): A unique identifier for each mortgage application.
- Date of Application (date): The date on which the mortgage application was submitted.
- Property Address (string): The address of the property for which the mortgage is being sought.
- Property Type (string): The type of property (e.g., Single Family Home, Multi-family Home, Condominium, Townhouse).
- Property Value (float): The current market value of the property in USD.
- Loan Amount (float): The requested mortgage loan amount in USD.
- Loan Term (integer): The number of years for the mortgage loan term.
- Loan-to-Value Ratio (float): The ratio of the loan amount to the property value, expressed as a percentage.
- Debt-to-Income Ratio (float): The applicant's total monthly debt payments divided by their gross monthly income, expressed as a percentage.
- Fraud Indicator (boolean): A binary indicator of whether the mortgage application is fraudulent (1) or legitimate (0).
Sources:
- https://www.jpmorgan.com/technology/technology-blog/synthetic-data-for-real-insights
- https://www.clearbox.ai/use-cases/improving-fraud-detection
- https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022
提供机构:
Clearbox AI
创建时间:
2023-04-05
搜集汇总
数据集介绍

背景与挑战
背景概述
这是一个合成数据集,用于模拟银行抵押贷款申请,旨在识别潜在欺诈行为。它包含客户和申请两个表格,涵盖申请人详情、财务指标和欺诈标识,适用于开发与评估欺诈检测模型。
以上内容由遇见数据集搜集并总结生成



