ssunggun2/LG_ConvFin_MCQ
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ssunggun2/LG_ConvFin_MCQ
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
task_categories:
- question-answering
- text-generation
pretty_name: LG ConvFinQA MCQ for Reward Model Training
size_categories:
- n<1K
tags:
- finance
- reasoning
- reward-model
- multiple-choice
- consistency-verified
license: mit
---
# LG ConvFinQA MCQ Dataset
## Dataset Description
This dataset contains high-quality Multiple-Choice Questions (MCQs) generated from the [ConvFinQA dataset](https://huggingface.co/datasets/AdaptLLM/ConvFinQA) for Reward Model (RM) training.
Each question has been:
1. **Generated** with 4 carefully crafted answer choices (correct/incorrect × detailed reasoning/answer only)
2. **Verified** using hybrid consistency checking:
- Self-consistency: N=10 samplings with the same model
- Multi-model verification: Cross-validation with 3 different models
3. **Filtered** to keep only high-quality samples (combined confidence ≥ 0.5)
## Dataset Statistics
- **Total samples**: 98
- **Average self-consistency score**: 0.8224
- **Self-consistency gold match rate**: 0.8673
- **Average multi-model consensus rate**: 0.7653
- **Multi-model consensus count**: 78
- **Average combined confidence**: 0.7996
## Generation Details
- **Generation model**: meta-llama/Meta-Llama-3.1-8B-Instruct
- **Self-consistency samples**: 10 per question
- **Self-consistency temperature**: 0.7
- **Multi-model verifiers**: ['DragonLLM/Llama-Open-Finance-8B', 'Qwen/Qwen2.5-7B-Instruct', 'gpt-4o-mini']
- **Confidence weights**: Self-consistency=0.6, Multi-model=0.4
- **Generated at**: 20260327_022516
## Dataset Fields
Each sample contains the original ConvFinQA context + generated verification metrics:
- `id`: Unique identifier from ConvFinQA
- `question`: The financial question
- `pre_text`: Text preceding the table (JSON-serialized string)
- `table`: The financial table (JSON-serialized string)
- `post_text`: Text following the table (JSON-serialized string)
- `annotation`: Original dataset annotations and calculation paths (JSON-serialized string to preserve schema stability)
- `gold_answer`: The correct answer
- `generated_mcq`: The complete generated MCQ dictionary containing all 4 choices and their types (JSON-serialized string)
- `reasoning_text`: Detailed reasoning for the correct answer
- `self_consistency_agreement_rate`: Agreement rate across N samplings (0-1)
- `self_consistency_gold_match`: Whether majority answer matches gold answer
- `self_consistency_majority_answer`: Most common answer from N samplings
- `multi_model_consensus`: Whether majority of models agreed
- `multi_model_consensus_rate`: Agreement rate across models (0-1)
- `combined_confidence`: Weighted average of self-consistency and multi_model scores
## Use Cases
This dataset is designed for:
1. **Reward Model Training**: Train models to distinguish high-quality from low-quality reasoning
2. **Preference Learning**: Learn to rank answers by reasoning quality
3. **Financial QA Evaluation**: Benchmark financial reasoning capabilities
## Citation
If you use this dataset, please cite:
```bibtex
@misc{lg_convfin_mcq_2026,
title={LG ConvFinQA MCQ Dataset for Reward Model Training},
author={LG AI Research},
year={2026},
howpublished={\url{https://huggingface.co/datasets/ssunggun2/LG_ConvFin_MCQ}}
}
```
Original ConvFinQA dataset:
```bibtex
@article{chen2022convfinqa,
title={ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering},
author={Chen, Zhiyu and others},
journal={arXiv preprint arXiv:2210.03849},
year={2022}
}
```
## License
MIT License
提供机构:
ssunggun2



