Vietnamese-Entrance-Exam

Name: Vietnamese-Entrance-Exam
Creator: maas
Published: 2025-11-27 16:28:11
License: 暂无描述

魔搭社区2025-11-27 更新2025-04-05 收录

下载链接：

https://modelscope.cn/datasets/Intelligent-Internet/Vietnamese-Entrance-Exam

下载链接

链接失效反馈

官方服务：

资源简介：

## Vietnamese Entrance Exam Dataset The Vietnamese Entrance Exam dataset is a collection of 432 problems derived from Vietnamese University entrance examinations. The dataset aims to provide a novel benchmark for testing reasoning capabilities of language models in several low resource domains specifically designed to minimize potential data contamination from pre-training or post-training exposure. |Domain | Count| |--------|--------| |Physics| 95| |Chemistry| 94| |Math | 243| ### Data Collection Process 1. Source Collection * Problems gathered from high-quality TeX sources. * Additional problems extracted via OCR using Gemini-2.0-Flash. 2. Translation * Original Vietnamese problems translated to English. 3. Reformulation Process * Multiple-choice questions converted to direct numerical answer format. * Transformation process inspired by BigMath methodology. * Each question reformulated using predefined criteria. 4. Validation * LLM judge evaluation of reformulated questions. * Verification of transformation validity. * Final rewriting based on judge's criteria. ### Benchmark | Model | Chemistry benchmark | Physic benchmark | Math benchmark | |-------------------------------|---------------------|------------------|----------------| | O1 | 21.27 | 52.63 | 50.2 | | O1-mini | 22.34 | 56.84 | 65.02 | | O3-mini | 18.08 | 60.00 | 72.48 | | DeepSeek-R1 | **30.85** | **74.73** | 80.24 | | DeepSeek-R1-Distill-Qwen-32B | 19.14 | 57.89 | 72.43 | |Qwen/QwQ-32B| 26.59 |73.68|**81.89**| ### Citation If you find our work useful, please cite our technical report: ```bib @misc{2025iithought, title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset}, author={Intelligent Internet} year={2025}, } ```

## 越南高考数据集（Vietnamese Entrance Exam Dataset）本数据集共收录432道取材自越南大学入学考试的试题，旨在为测试大语言模型（Large Language Model，LLM）在多个低资源领域的推理能力提供全新基准，其设计初衷为尽可能降低模型在预训练或微调阶段接触该数据而引发的数据污染风险。 | 学科领域 | 题量 | |--------|--------| | 物理(Physics) | 95 | | 化学(Chemistry) | 94 | | 数学(Math) | 243 | ### 数据收集流程 1. 数据源采集 * 试题均取自高质量TeX源文件。 * 其余试题通过借助Gemini-2.0-Flash的光学字符识别（Optical Character Recognition，OCR）技术提取。 2. 翻译环节 * 将原始越南语试题译为英语。 3. 格式重构流程 * 将选择题转换为直接数值作答格式。 * 本次格式重构流程借鉴了BigMath方法论。 * 所有试题均依据预设标准完成重构。 4. 验证环节 * 采用大语言模型作为评判者对重构后的试题进行评估。 * 验证格式转换的有效性。 * 依据评判标准完成最终改写。 ### 基准测试 | 模型 | 化学基准得分 | 物理基准得分 | 数学基准得分 | |-------------------------------|---------------------|------------------|----------------| | O1 | 21.27 | 52.63 | 50.2 | | O1-mini | 22.34 | 56.84 | 65.02 | | O3-mini | 18.08 | 60.00 | 72.48 | | DeepSeek-R1 | **30.85** | **74.73** | 80.24 | | DeepSeek-R1-Distill-Qwen-32B | 19.14 | 57.89 | 72.43 | | Qwen/QwQ-32B | 26.59 | 73.68 | **81.89** | ### 引用若您认为本工作对研究有所帮助，请引用以下技术报告： bib @misc{2025iithought, title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset}, author={Intelligent Internet} year={2025}, }

提供机构：

maas

创建时间：

2025-03-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集