duykhangh/VNFinsQA
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/duykhangh/VNFinsQA
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
language:
- vi
task_categories:
- question-answering
- text-generation
tags:
- finance
- vietnamese
- financial-qa
- benchmark
- knowledge-graph
- rag
- stock-market
size_categories:
- n<1K
pretty_name: VNFinsQA - Vietnamese Financial Question Answering Benchmark
dataset_info:
features:
- name: vnfinsqa_id
dtype: string
- name: question
dtype: string
- name: tickers
dtype: string
- name: question_category
dtype: string
- name: difficulty
dtype: string
- name: requires_multi_doc
dtype: bool
- name: required_data_sources
dtype: string
- name: ground_truth
dtype: string
- name: ground_truth_short
dtype: string
splits:
- name: test
num_examples: 790
configs:
- config_name: default
data_files:
- split: test
path: test.jsonl
---
# VNFinsQA: Vietnamese Financial Question Answering Benchmark
## Dataset Description
VNFinsQA is a benchmark dataset for evaluating Vietnamese financial question-answering systems. It contains **790 expert-annotated Vietnamese questions** with ground-truth answers, collected from production financial QA systems and curated by securities analysts.
The dataset covers diverse financial question types including factual lookups, stock analysis, technical analysis, valuation, cross-stock comparisons, screening, investment recommendations, and macroeconomic questions. All questions are in Vietnamese and target the Vietnamese stock market (HOSE, HNX).
### Supported Tasks
- **Financial Question Answering**: Given a Vietnamese financial question, generate an accurate answer grounded in financial data.
- **Information Retrieval Evaluation**: Evaluate retrieval systems on their ability to find relevant financial documents for answering questions.
- **RAG System Benchmarking**: Benchmark Retrieval-Augmented Generation systems on domain-specific Vietnamese financial queries.
### Languages
Vietnamese (vi)
## Dataset Structure
### Data Fields
| Field | Type | Description |
|-------|------|-------------|
| `vnfinsqa_id` | string | Unique sequential ID (`VNFQ-0001`, `VNFQ-0002`, ...) |
| `question` | string | The question text in Vietnamese |
| `tickers` | string \| null | Comma-separated stock tickers (e.g., `"HPG,VNM"`) |
| `question_category` | string | Question category (see below) |
| `difficulty` | string | `"easy"`, `"medium"`, or `"hard"` |
| `requires_multi_doc` | bool | Whether answering likely requires multiple documents |
| `required_data_sources` | string | Comma-separated data source types needed to answer (see below) |
| `ground_truth` | string | Gold-standard answer from domain experts |
| `ground_truth_short` | string \| null | Short-form answer |
### Data Splits
| Split | Examples | Description |
|-------|----------|-------------|
| test | 790 | Full benchmark evaluation set |
This is a **test-only benchmark** intended for evaluating financial QA systems. It is not designed for training.
## Dataset Statistics
### Question Categories
| Category | Count | Description |
|----------|-------|-------------|
| other | 191 | General financial queries |
| technical | 121 | Technical analysis (charts, indicators) |
| recommendation | 119 | Buy/sell/hold recommendations |
| factual | 89 | Data lookup (price, volume, financials) |
| analysis | 81 | Stock analysis and assessment |
| screening | 70 | Stock screening and filtering |
| macro | 49 | Macroeconomic questions |
| valuation | 43 | Valuation methods and fair value |
| comparison | 27 | Cross-stock or cross-sector comparisons |
### Difficulty Distribution
| Level | Count | Criteria |
|-------|-------|----------|
| easy | 127 | Single-ticker factual lookups |
| medium | 497 | Standard analysis, single/few-ticker questions |
| hard | 166 | Multi-ticker comparisons, macro without tickers, valuations |
### Required Data Sources Distribution
Each question is labeled with 1-3 data source types needed to answer it (multi-label, classified using Gemini 2.0 Flash with few-shot prompting).
| Data Source | Count | % | Description |
|-------------|-------|---|-------------|
| market_data | 606 | 64.6% | Stock prices, volume, technical indicators (RSI, MA, MACD) |
| financial_report | 469 | 50.0% | Financial statements, ratios (ROE, P/E, EPS) |
| general_knowledge | 173 | 18.4% | Financial concepts, definitions, methodology |
| news_events | 149 | 15.9% | Corporate news, dividends, M&A, announcements |
| macro_economic | 149 | 15.9% | GDP, CPI, interest rates, monetary/fiscal policy |
| company_profile | 61 | 6.5% | Company overview, business sector, management |
Labels per question: 42.3% single-label, 44.5% dual-label, 12.8% triple-label.
## Dataset Construction
1. Merged questions from 3 sources (production system logs and analyst contributions)
2. Filtered to Vietnamese single-turn questions only
3. Deduplicated by normalized text
4. Auto-assigned difficulty and multi-document requirement heuristics
5. Ground-truth answers provided by certified financial analysts (CFA or equivalent)
6. Filtered to questions with verified ground-truth answers only
## Example
```json
{
"vnfinsqa_id": "VNFQ-0001",
"question": "Phân tích doanh thu, biên lợi nhuận năm trước của cổ phiếu HPG",
"tickers": "HPG",
"question_category": "factual",
"difficulty": "easy",
"requires_multi_doc": false,
"required_data_sources": "financial_report",
"ground_truth": "HPG năm 2024: Doanh thu thuần đạt 138.855 nghìn tỷ VND (tăng 16,7% so với 2023). Lợi nhuận gộp 18.498 nghìn tỷ VND, biên lợi nhuận gộp ~13,3%. LNST đạt 12.020 nghìn tỷ VND (tăng 76,7% YoY), biên lợi nhuận ròng ~8,7%.",
"ground_truth_short": "DT thuần 2024: 138.855 nghìn tỷ (+16,7% YoY); biên LN gộp 13,3%; LNST 12.020 nghìn tỷ (+76,7% YoY)"
}
```
## Limitations
- **Single institution**: Questions collected from one Vietnamese securities firm
- **Vietnamese only**: All questions and answers are in Vietnamese
- **Broad "other" category**: 191/790 questions are categorized as "other"
- **Temporal scope**: Questions primarily reference 2020-2024 financial data
- **Ground truth coverage**: Answers reflect analyst knowledge at time of annotation; financial data may have since been updated
## Citation
If you use this dataset, please cite:
```bibtex
@inproceedings{vnfinsqa2026,
title = {Production-Ready Hierarchical Knowledge Graph RAG for Vietnamese Financial Question Answering},
author = {Anonymous},
booktitle = {Proceedings of the 35th ACM International Conference on Information and Knowledge Management (CIKM '26)},
year = {2026},
address = {Rome, Italy},
note = {Under review}
}
```
## License
This dataset is released under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. It may be used for non-commercial research purposes with attribution.
提供机构:
duykhangh



