RakeshMadasani/banking-finance-qa-dataset
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/RakeshMadasani/banking-finance-qa-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- banking
- finance
- aml
- kyc
- compliance
- instruction-tuning
- alpaca
- qlora
pretty_name: Banking & Finance QA Dataset
size_categories:
- 1K<n<10K
---
# Banking & Finance QA Dataset
This dataset is a custom instruction-response dataset created for banking, finance, AML/KYC, compliance, and regulatory question answering.
It was built as part of a 3-project GenAI portfolio:
- Banking RAG Assistant
- Banking & Finance QA Dataset
- Banking Finance QLoRA Fine-Tuned Model
## Dataset Summary
- **Total samples:** 3,002
- **Format:** Alpaca-style instruction / input / output
- **Language:** English
- **Domain:** Banking and Finance
- **Primary use case:** Instruction tuning / QLoRA fine-tuning
## Data Format
Each record follows this structure:
```json
{
"instruction": "What is the FDIC deposit insurance limit in the United States?",
"input": "",
"output": "The FDIC deposit insurance limit is $250,000 per depositor, per insured bank, per account category."
}
```
## Dataset Creation
This dataset was generated from banking and finance domain material, including regulatory and compliance-oriented content related to:
- AML and KYC
- FDIC and deposit insurance
- Basel III and capital adequacy
- RBI and Indian banking regulation
- SAR / CTR / compliance reporting
- credit risk and financial concepts
The pipeline included:
- document collection and curation
- chunking and preprocessing
- instruction-response generation
- duplicate removal
- validation and formatting checks
- train / validation split preparation
## Topics Covered
The dataset includes questions and answers across areas such as:
- Anti-money laundering (AML)
- Know Your Customer (KYC)
- Customer due diligence (CDD / EDD)
- FDIC insurance
- Basel norms
- RBI regulations
- SAR and CTR reporting
- credit risk and financial ratios
- compliance and banking operations
## Intended Use
This dataset is intended for:
- instruction tuning
- domain adaptation of LLMs
- QLoRA / PEFT fine-tuning
- educational and portfolio projects
- banking and compliance QA experiments
## Limitations
- This dataset is designed for educational and portfolio use
- It should not be treated as authoritative legal or compliance guidance
- Generated instruction-response pairs may still require human review before production use
- Regulatory rules may change over time
## Splits
The dataset is published with:
- `train`
- `validation`
## Related Projects
- RAG Assistant: `RakeshMadasani/banking-finance-rag`
- Fine-Tuned Model: `RakeshMadasani/banking-finance-mistral-qlora`
## Author
Created by Rakesh Madasani as part of an end-to-end GenAI portfolio covering:
- RAG application development
- dataset generation
- instruction tuning
- QLoRA fine-tuning
- evaluation and deployment
提供机构:
RakeshMadasani



