five

RakeshMadasani/banking-finance-qa-dataset

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/RakeshMadasani/banking-finance-qa-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation - question-answering language: - en tags: - banking - finance - aml - kyc - compliance - instruction-tuning - alpaca - qlora pretty_name: Banking & Finance QA Dataset size_categories: - 1K<n<10K --- # Banking & Finance QA Dataset This dataset is a custom instruction-response dataset created for banking, finance, AML/KYC, compliance, and regulatory question answering. It was built as part of a 3-project GenAI portfolio: - Banking RAG Assistant - Banking & Finance QA Dataset - Banking Finance QLoRA Fine-Tuned Model ## Dataset Summary - **Total samples:** 3,002 - **Format:** Alpaca-style instruction / input / output - **Language:** English - **Domain:** Banking and Finance - **Primary use case:** Instruction tuning / QLoRA fine-tuning ## Data Format Each record follows this structure: ```json { "instruction": "What is the FDIC deposit insurance limit in the United States?", "input": "", "output": "The FDIC deposit insurance limit is $250,000 per depositor, per insured bank, per account category." } ``` ## Dataset Creation This dataset was generated from banking and finance domain material, including regulatory and compliance-oriented content related to: - AML and KYC - FDIC and deposit insurance - Basel III and capital adequacy - RBI and Indian banking regulation - SAR / CTR / compliance reporting - credit risk and financial concepts The pipeline included: - document collection and curation - chunking and preprocessing - instruction-response generation - duplicate removal - validation and formatting checks - train / validation split preparation ## Topics Covered The dataset includes questions and answers across areas such as: - Anti-money laundering (AML) - Know Your Customer (KYC) - Customer due diligence (CDD / EDD) - FDIC insurance - Basel norms - RBI regulations - SAR and CTR reporting - credit risk and financial ratios - compliance and banking operations ## Intended Use This dataset is intended for: - instruction tuning - domain adaptation of LLMs - QLoRA / PEFT fine-tuning - educational and portfolio projects - banking and compliance QA experiments ## Limitations - This dataset is designed for educational and portfolio use - It should not be treated as authoritative legal or compliance guidance - Generated instruction-response pairs may still require human review before production use - Regulatory rules may change over time ## Splits The dataset is published with: - `train` - `validation` ## Related Projects - RAG Assistant: `RakeshMadasani/banking-finance-rag` - Fine-Tuned Model: `RakeshMadasani/banking-finance-mistral-qlora` ## Author Created by Rakesh Madasani as part of an end-to-end GenAI portfolio covering: - RAG application development - dataset generation - instruction tuning - QLoRA fine-tuning - evaluation and deployment
提供机构:
RakeshMadasani
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作