five

gretel-text-to-python-fintech-en-v1

收藏
魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/gretelai/gretel-text-to-python-fintech-en-v1
下载链接
链接失效反馈
官方服务:
资源简介:
# Gretel Synthetic Text-to-Python Dataset for FinTech This dataset is a synthetically generated collection of natural language prompts paired with their corresponding Python code snippets, specifically tailored for the FinTech industry. Created using Gretel Navigator's Data Designer, with `mistral-nemo-2407` and `Qwen/Qwen2.5-Coder-7B` as the backend models, it aims to bridge the gap between natural language inputs and high-quality Python code, empowering professionals to implement financial analytics without extensive coding skills. ## Key Features - **Domain-Specific Focus**: Covers a wide range of FinTech domains such as banking, digital payments, regulatory compliance, fraud detection, and more. - **Synthetic Data Generation**: Generated entirely using Gretel Navigator with a specialized configuration, ensuring diverse and realistic data samples that have undergone automated validation for quality and consistency. - **Complexity Levels**: Includes Python code ranging from beginner to expert levels, accommodating different user skill sets. - **Comprehensive Metadata**: Each record includes metadata like industry sector, topic, code complexity, and code concept. - **Validation and Quality Assurance**: Implements rigorous validation, including code syntax checks and LLM-as-a-Critic evaluations, to ensure the generated code is both functional and of high quality. ## Dataset Details ### Dataset Size and Composition - **Total Samples**: 25,000 - **Training Set**: 22,500 records - **Validation Set**: 2,500 records - **Test Set**: 2,500 records - **Industry Sectors and Topics Covered**: - **Banking and Finance**: - Topics: Account Management, Trading Algorithms, Cryptocurrency, Credit Scoring, Stock Market Analysis, Securities Trading, Online Banking, Mobile Banking, Wealth Management, Payment Gateway, Fraud Detection, Banking APIs, Loan Origination, ATM Processing, Financial Data Analytics, Banking Regulations, Money Transfer, Tax Calculations, Insurance Underwriting, Risk Assessment. - **FinTech and Digital Payments**: - Topics: Digital Banking, Fraud Detection, Merchant Services, Payment Gateway, Payment Fraud Prevention, Peer-to-Peer Payments, Open Banking, Instant Payments, Real-Time Analytics, Mobile Wallets, Digital Currency, Payment Orchestration, Payment Processing, Cross-Border Payments, API Management, Micropayments, Blockchain, Smart Contracts, Banking-as-a-Service, KYC/AML Compliance. - **Financial Regulation and Compliance**: - Topics: Capital Adequacy, Transparency, GDPR, Solvency II, Compliance Management, Risk Management, MiFID II, Fintech Regulation, AML/KYC, Data Privacy, EMIR, Operational Risk, Regulatory Reporting, IFRS 9, Basel III, Virtual Currencies, Financial Crime, Sanctions Screening, PTSD, Credit Risk. - **Fraud Detection and Prevention**: - Topics: Chargeback Management, Device Fingerprinting, Synthetic Identity Detection, Fraudulent Transaction Patterns, Behavioral Biometrics, Risk Scoring, Phishing Detection, Transaction Monitoring, Fraudulent Bot Detection, Money Laundering Detection, Cross-Channel Fraud Detection, Machine Learning, Identity Verification, Real-Time Alerts, Insurance Fraud Detection, Ad Fraud Detection, False Positive Reduction, Fraudulent Account Takeover, Anomaly Detection, Rule-Based Systems. - **Insurance and Risk Management**: - Topics: Insurance Reporting, Insurance Analytics, Actuarial Science, Policy Cancellation, Policy Billing, Fraud Detection, Reinsurance, Regulatory Compliance, Policy Administration, Underwriting, Policy Renewal, Agent Management, Policy Document Management, Risk Assessment, Claims Processing, Catastrophe Modeling, Insurance Marketplace, Insurance Portfolio Management, Customer Relationship Management, Policy Servicing. - **Mobile Banking**: - Topics: Security Features, Budget Tracking, Mobile Check Deposit Limits, Educational Resources, Bill Payment, Fraud Detection, Account Management, Mobile Check Deposit, Push Notifications, Savings Goals, Biometric Login, Real-time Alerts, Investment Options, Location Services, Customer Support, Fund Transfer, App Customization, Account Lock/Unlock, Money Management Tools, Transaction History. - **Mortgage and Lending Platforms**: - Topics: Loan Term, Loan-to-Value Ratio, Property Value Assessment, Mortgage Calculator, Lender Comparison, E-signature, Loan Approval Status, Credit Score Check, Loan Modification, Amortization Schedule, Loan Application, Loan Forbearance, Mortgage Servicing, Loan Pre-approval, Underwriting Process, Loan Closing, Document Upload, Monthly Repayment, Interest Rates, Refinance Calculator. - **Smart Contracts**: - Topics: Governance Mechanisms, Privacy Preservation, Access Control, Contract Auditing, Oracle Services, Gas Management, Zero-Knowledge Proofs, Interoperability, Token Standards, Smart Contract Lifecycle, Cross-Chain Communication, Fungible Tokens, Smart Contract Platforms, Decentralized Finance, Contract Interaction, Voting Systems, Non-Fungible Tokens, Contract Deployment, Formal Verification, Decentralized Autonomous Organizations. - **Tax Technology**: - Topics: Tax Compliance, Tax Consulting Services, Tax Data Management, Tax Software Integration, Tax Document Management, ERP for Tax, Tax Regulatory Tracking, CRM for Tax, Tax Filing, Tax Audit Support, Tax Process Optimization, Tax Software Implementation, Tax Data Visualization, Tax Analytics, Tax Workflow Management, Tax Reporting, Tax Research, Tax Training and Education, Tax Automation, Tax Calculation. - **Trading and Investment**: - Topics: Wealth Management Software, Financial Data Feed, Algorithmic Trading, Trading Platforms, Pricing Engines, Securities Lending, Cryptocurrency Trading, Risk Management Software, Equity Trading, Investment Portfolio Management, Market Analysis Tools, Derivatives Trading, High-Frequency Trading, Smart Order Routing, Trading Signals, Order Management System, Transaction Cost Analysis, Automated Order Execution, Brokerage Software, Charting Tools. ### Complexity Levels and Code Concepts - **Beginner**: - **Code Concepts**: Variables, Data Types, Functions, Loops, Classes - **Intermediate**: - **Code Concepts**: List Comprehensions, Object-Oriented Programming, Lambda Functions, Web Frameworks, Pandas - **Advanced**: - **Code Concepts**: Multithreading, Context Managers, Performance Optimization, Modules and Packages, Regular Expressions - **Expert**: - **Code Concepts**: Custom Data Structures, Metaclasses, Coroutines, Memory Management ### Prompt Types - **Instruction**: Commands to perform a specific task. - **Question**: Inquiries about solving problems in the FinTech industry using Python. - **Description**: Explanations of the purpose of Python code for FinTech functions. ## Fields and Structure Each record in the dataset includes the following columns: | Field | Type | Description | |-------------------|--------|-----------------------------------------------------------------------------------------------| | `industry_sector` | string | Specific FinTech domain (e.g., Banking and Finance, Smart Contracts). | | `topic` | string | Specific topic within the industry sector (e.g., Loan Forbearance, Mobile Banking). | | `code_complexity` | string | Complexity level of the code (e.g., Beginner, Intermediate, Advanced, Expert). | | `code_concept` | string | Programming concept demonstrated (e.g., Variables, Loops, Functions, Custom Data Structures). | | `text` | string | Natural language instruction describing the desired functionality. | | `code` | string | The Python code generated based on the instruction. | ## Validation and Evaluation Metrics To ensure the dataset's reliability and usefulness, we conducted extensive validation, including: - Syntactic Validation: All code snippets have been tested for syntax errors to ensure they are executable. - LLM-as-a-Critic Evaluation: Leveraging a Large Language Model to assess: - Relevance: Alignment of the code with the original instruction. - Correctness: Accuracy and functionality of the code. - Readability: Clarity and ease of understanding. - Efficiency: Performance and scalability considerations. - Pythonic Best Practices: Adherence to standard Python coding conventions. ## Usage This dataset can be used for training and fine-tuning language models to generate Python code from natural language instructions, specifically in the FinTech domain. It is ideal for tasks such as: - Text-to-Code Generation - Code Completion - Programming Education Tools - AI-assisted Code Synthesis ## Citation and Usage If you use this dataset in your research or applications, please cite it as: ```bibtex @dataset{gretel-text-to-python-fintech-v1, author = {Gretel AI}, title = {Synthetic Text-to-Python Dataset for FinTech Applications}, year = {2024}, month = {10}, publisher = {Gretel}, } ``` For questions, issues, or additional information, please visit our [Synthetic Data Discord](https://gretel.ai/discord) community or reach out to [gretel.ai](https://gretel.ai/).

# 格莱特(Gretel)金融科技专用合成式文本转Python数据集 本数据集是一组合成生成的自然语言提示词与对应Python代码片段的集合,专为金融科技(FinTech)行业定制。本数据集通过格莱特导航器(Gretel Navigator)的数据设计器(Data Designer)构建,以`mistral-nemo-2407`和`Qwen/Qwen2.5-Coder-7B`作为后端模型,旨在打通自然语言输入与高质量Python代码之间的壁垒,帮助金融行业从业者无需掌握深厚编码技能即可实现金融分析功能。 ## 核心特性 - **领域针对性聚焦**:覆盖银行、数字支付、监管合规、欺诈检测等广泛的金融科技领域。 - **合成数据生成**:完全通过格莱特导航器结合专用配置生成,确保样本多样性与真实性,并经过自动化验证以保障数据质量与一致性。 - **复杂度层级覆盖**:包含从入门到专家级的Python代码,适配不同用户的技能水平。 - **全量元数据**:每条记录均包含行业领域、主题、代码复杂度、代码概念等元数据。 - **验证与质量保障**:实施严格的验证流程,包括代码语法检查与大语言模型(LLM)作为评审的评估,确保生成的代码兼具可用性与高质量。 ## 数据集详情 ### 数据集规模与构成 - **总样本量**:25000条 - **训练集**:22500条记录 - **验证集**:2500条记录 - **测试集**:2500条记录 - **覆盖的行业领域与主题**: - **银行与金融**: - 主题:账户管理、交易算法、加密货币、信用评分、股票市场分析、证券交易、网上银行、手机银行、财富管理、支付网关、欺诈检测、银行API、贷款发放、ATM处理、金融数据分析、银行监管、资金转账、税务计算、保险承保、风险评估。 - **金融科技与数字支付**: - 主题:数字银行、欺诈检测、商户服务、支付网关、支付欺诈防范、点对点支付、开放银行、即时支付、实时分析、移动钱包、数字货币、支付编排、支付处理、跨境支付、API管理、微支付、区块链、智能合约、银行即服务、了解你的客户(KYC)/反洗钱(AML)合规。 - **金融监管与合规**: - 主题:资本充足率、透明度、通用数据保护条例(GDPR)、偿付能力II(Solvency II)、合规管理、风险管理、金融工具市场指令II(MiFID II)、金融科技监管、AML/KYC、数据隐私、欧洲市场基础设施监管条例(EMIR)、操作风险、监管报告、国际财务报告准则第9号(IFRS 9)、巴塞尔协议III(Basel III)、虚拟货币、金融犯罪、制裁筛查、PTSD、信用风险。 - **欺诈检测与防范**: - 主题:退款管理、设备指纹识别、合成身份检测、欺诈交易模式、行为生物识别、风险评分、钓鱼检测、交易监控、欺诈机器人检测、洗钱检测、跨渠道欺诈检测、机器学习、身份验证、实时警报、保险欺诈检测、广告欺诈检测、误报降低、欺诈性账户接管、异常检测、基于规则的系统。 - **保险与风险管理**: - 主题:保险报告、保险分析、精算科学、保单注销、保单计费、欺诈检测、再保险、监管合规、保单管理、承保、保单续保、代理人管理、保单文档管理、风险评估、理赔处理、灾难建模、保险市场、保险组合管理、客户关系管理、保单服务。 - **手机银行**: - 主题:安全功能、预算追踪、手机支票存款限额、教育资源、账单支付、欺诈检测、账户管理、手机支票存款、推送通知、储蓄目标、生物识别登录、实时警报、投资选项、定位服务、客户支持、资金转账、应用自定义、账户锁定/解锁、资金管理工具、交易历史。 - **抵押贷款与借贷平台**: - 主题:贷款期限、贷款价值比、物业价值评估、抵押贷款计算器、贷款机构对比、电子签名、贷款审批状态、信用评分查询、贷款修改、摊销计划表、贷款申请、贷款宽限、抵押贷款服务、贷款预审批、承保流程、贷款结清、文档上传、月度还款、利率、再融资计算器。 - **智能合约**: - 主题:治理机制、隐私保护、访问控制、合约审计、预言机服务、燃气管理、零知识证明(Zero-Knowledge Proofs)、互操作性、代币标准、智能合约生命周期、跨链通信、同质化代币、智能合约平台、去中心化金融、合约交互、投票系统、非同质化代币、合约部署、形式化验证、去中心化自治组织。 - **税务科技**: - 主题:税务合规、税务咨询服务、税务数据管理、税务软件集成、税务文档管理、税务ERP、税务监管追踪、税务CRM、税务申报、税务审计支持、税务流程优化、税务软件实施、税务数据可视化、税务分析、税务工作流管理、税务报告、税务研究、税务培训与教育、税务自动化、税务计算。 - **交易与投资**: - 主题:财富管理软件、金融数据源、算法交易、交易平台、定价引擎、证券借贷、加密货币交易、风险管理软件、股票交易、投资组合管理、市场分析工具、衍生品交易、高频交易、智能订单路由、交易信号、订单管理系统、交易成本分析、自动化订单执行、经纪软件、绘图工具。 ### 复杂度等级与代码概念 - **入门级**: - **代码概念**:变量、数据类型、函数、循环、类 - **进阶级**: - **代码概念**:列表推导式、面向对象编程、Lambda函数、Web框架、Pandas - **高级**: - **代码概念**:多线程、上下文管理器、性能优化、模块与包、正则表达式 - **专家级**: - **代码概念**:自定义数据结构、元类、协程、内存管理 ### 提示词类型 - **指令型**:用于执行特定任务的命令。 - **问答型**:关于使用Python解决金融科技行业问题的询问。 - **说明型**:对金融科技功能所用Python代码的用途解释。 ## 字段与结构 本数据集的每条记录包含以下列: | 字段名 | 类型 | 描述 | |-------------------|--------|-----------------------------------------------------------------------------------------------| | `industry_sector` | 字符串 | 具体的金融科技领域(如银行与金融、智能合约)。 | | `topic` | 字符串 | 该行业领域下的具体主题(如贷款宽限、手机银行)。 | | `code_complexity` | 字符串 | 代码的复杂度等级(如入门级、进阶级、高级、专家级)。 | | `code_concept` | 字符串 | 所展示的编程概念(如变量、循环、函数、自定义数据结构)。 | | `text` | 字符串 | 描述所需功能的自然语言指令。 | | `code` | 字符串 | 基于该指令生成的Python代码。 | ## 验证与评估指标 为确保本数据集的可靠性与实用性,我们开展了全面的验证工作,包括: - 语法验证:已对所有代码片段进行语法错误测试,确保其可执行。 - 大语言模型(LLM)作为评审的评估:借助大语言模型从以下维度进行评估: - 相关性:代码与原始指令的匹配程度。 - 正确性:代码的准确性与功能完整性。 - 可读性:代码的清晰性与易理解程度。 - 效率:性能与可扩展性考量。 - Python最佳实践:是否符合标准Python编码规范。 ## 应用场景 本数据集可用于训练与微调大语言模型,使其能够根据自然语言指令生成特定领域的Python代码,尤其适用于金融科技场景,适配的任务包括: - 文本转代码生成 - 代码补全 - 编程教育工具 - AI辅助代码合成 ## 引用与使用规范 若在研究或应用中使用本数据集,请按照以下格式引用: bibtex @dataset{gretel-text-to-python-fintech-v1, author = {Gretel AI}, title = {Synthetic Text-to-Python Dataset for FinTech Applications}, year = {2024}, month = {10}, publisher = {Gretel}, } 如有疑问、问题或需要更多信息,请加入我们的[Synthetic Data Discord](https://gretel.ai/discord)社区,或访问[gretel.ai](https://gretel.ai/)。
提供机构:
maas
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作