five

duykhangh/VNFinsQA

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/duykhangh/VNFinsQA
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 language: - vi task_categories: - question-answering - text-generation tags: - finance - vietnamese - financial-qa - benchmark - knowledge-graph - rag - stock-market size_categories: - n<1K pretty_name: VNFinsQA - Vietnamese Financial Question Answering Benchmark dataset_info: features: - name: vnfinsqa_id dtype: string - name: question dtype: string - name: tickers dtype: string - name: question_category dtype: string - name: difficulty dtype: string - name: requires_multi_doc dtype: bool - name: required_data_sources dtype: string - name: ground_truth dtype: string - name: ground_truth_short dtype: string splits: - name: test num_examples: 790 configs: - config_name: default data_files: - split: test path: test.jsonl --- # VNFinsQA: Vietnamese Financial Question Answering Benchmark ## Dataset Description VNFinsQA is a benchmark dataset for evaluating Vietnamese financial question-answering systems. It contains **790 expert-annotated Vietnamese questions** with ground-truth answers, collected from production financial QA systems and curated by securities analysts. The dataset covers diverse financial question types including factual lookups, stock analysis, technical analysis, valuation, cross-stock comparisons, screening, investment recommendations, and macroeconomic questions. All questions are in Vietnamese and target the Vietnamese stock market (HOSE, HNX). ### Supported Tasks - **Financial Question Answering**: Given a Vietnamese financial question, generate an accurate answer grounded in financial data. - **Information Retrieval Evaluation**: Evaluate retrieval systems on their ability to find relevant financial documents for answering questions. - **RAG System Benchmarking**: Benchmark Retrieval-Augmented Generation systems on domain-specific Vietnamese financial queries. ### Languages Vietnamese (vi) ## Dataset Structure ### Data Fields | Field | Type | Description | |-------|------|-------------| | `vnfinsqa_id` | string | Unique sequential ID (`VNFQ-0001`, `VNFQ-0002`, ...) | | `question` | string | The question text in Vietnamese | | `tickers` | string \| null | Comma-separated stock tickers (e.g., `"HPG,VNM"`) | | `question_category` | string | Question category (see below) | | `difficulty` | string | `"easy"`, `"medium"`, or `"hard"` | | `requires_multi_doc` | bool | Whether answering likely requires multiple documents | | `required_data_sources` | string | Comma-separated data source types needed to answer (see below) | | `ground_truth` | string | Gold-standard answer from domain experts | | `ground_truth_short` | string \| null | Short-form answer | ### Data Splits | Split | Examples | Description | |-------|----------|-------------| | test | 790 | Full benchmark evaluation set | This is a **test-only benchmark** intended for evaluating financial QA systems. It is not designed for training. ## Dataset Statistics ### Question Categories | Category | Count | Description | |----------|-------|-------------| | other | 191 | General financial queries | | technical | 121 | Technical analysis (charts, indicators) | | recommendation | 119 | Buy/sell/hold recommendations | | factual | 89 | Data lookup (price, volume, financials) | | analysis | 81 | Stock analysis and assessment | | screening | 70 | Stock screening and filtering | | macro | 49 | Macroeconomic questions | | valuation | 43 | Valuation methods and fair value | | comparison | 27 | Cross-stock or cross-sector comparisons | ### Difficulty Distribution | Level | Count | Criteria | |-------|-------|----------| | easy | 127 | Single-ticker factual lookups | | medium | 497 | Standard analysis, single/few-ticker questions | | hard | 166 | Multi-ticker comparisons, macro without tickers, valuations | ### Required Data Sources Distribution Each question is labeled with 1-3 data source types needed to answer it (multi-label, classified using Gemini 2.0 Flash with few-shot prompting). | Data Source | Count | % | Description | |-------------|-------|---|-------------| | market_data | 606 | 64.6% | Stock prices, volume, technical indicators (RSI, MA, MACD) | | financial_report | 469 | 50.0% | Financial statements, ratios (ROE, P/E, EPS) | | general_knowledge | 173 | 18.4% | Financial concepts, definitions, methodology | | news_events | 149 | 15.9% | Corporate news, dividends, M&A, announcements | | macro_economic | 149 | 15.9% | GDP, CPI, interest rates, monetary/fiscal policy | | company_profile | 61 | 6.5% | Company overview, business sector, management | Labels per question: 42.3% single-label, 44.5% dual-label, 12.8% triple-label. ## Dataset Construction 1. Merged questions from 3 sources (production system logs and analyst contributions) 2. Filtered to Vietnamese single-turn questions only 3. Deduplicated by normalized text 4. Auto-assigned difficulty and multi-document requirement heuristics 5. Ground-truth answers provided by certified financial analysts (CFA or equivalent) 6. Filtered to questions with verified ground-truth answers only ## Example ```json { "vnfinsqa_id": "VNFQ-0001", "question": "Phân tích doanh thu, biên lợi nhuận năm trước của cổ phiếu HPG", "tickers": "HPG", "question_category": "factual", "difficulty": "easy", "requires_multi_doc": false, "required_data_sources": "financial_report", "ground_truth": "HPG năm 2024: Doanh thu thuần đạt 138.855 nghìn tỷ VND (tăng 16,7% so với 2023). Lợi nhuận gộp 18.498 nghìn tỷ VND, biên lợi nhuận gộp ~13,3%. LNST đạt 12.020 nghìn tỷ VND (tăng 76,7% YoY), biên lợi nhuận ròng ~8,7%.", "ground_truth_short": "DT thuần 2024: 138.855 nghìn tỷ (+16,7% YoY); biên LN gộp 13,3%; LNST 12.020 nghìn tỷ (+76,7% YoY)" } ``` ## Limitations - **Single institution**: Questions collected from one Vietnamese securities firm - **Vietnamese only**: All questions and answers are in Vietnamese - **Broad "other" category**: 191/790 questions are categorized as "other" - **Temporal scope**: Questions primarily reference 2020-2024 financial data - **Ground truth coverage**: Answers reflect analyst knowledge at time of annotation; financial data may have since been updated ## Citation If you use this dataset, please cite: ```bibtex @inproceedings{vnfinsqa2026, title = {Production-Ready Hierarchical Knowledge Graph RAG for Vietnamese Financial Question Answering}, author = {Anonymous}, booktitle = {Proceedings of the 35th ACM International Conference on Information and Knowledge Management (CIKM '26)}, year = {2026}, address = {Rome, Italy}, note = {Under review} } ``` ## License This dataset is released under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. It may be used for non-commercial research purposes with attribution.
提供机构:
duykhangh
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作