five

projudge-173k-full

收藏
魔搭社区2025-06-16 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/yzzthu/projudge-173k-full
下载链接
链接失效反馈
官方服务:
资源简介:
# ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges **ProJudge-173k** is the first large-scale instruction tuning dataset specifically designed for process evaluation with fine-grained step-level annotations. It features: - Multi-Modal: Various modalities, including pure text, single image, and multi-image interleaved content; - Multi-Discipline: 4 scientific disciplines: mathematics, physics, chemistry, and biology; - Multi-Difficulty: Diverse difficulty levels ranging from primary school to competition-levels. # An Example to load the data ```python # To load the entire dataset: from datasets import load_dataset dataset=load_dataset("julyai/ProJudge-173k", split="train") print(dataset[0]) # To load different subset: camel_dataset=load_dataset("julyai/ProJudge-173k/Camel-AI", split="train") print(camel_dataset[0]) k12_dataset=load_dataset("julyai/ProJudge-173k/k12", split="train") print(k12_dataset[0]) OlympiadBench_dataset=load_dataset("julyai/ProJudge-173k/OlympiadBench", split="train") print(OlympiadBench_dataset[0]) ``` More details on loading and using the data are at our [github page](https://github.com/jiaxin-ai/ProJudge). If you do find our code helpful or use our benchmark dataset, please citing our paper. ``` @article{ai2025projudge, title={ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges}, author={Jiaxin Ai and Pengfei Zhou and Zhaopan Xu and Ming Li and Fanrui Zhang and Zizhen Li and Jianwen Sun and Yukang Feng and Baojin Huang and Zhongyuan Wang and Kaipeng Zhang}, journal={arXiv preprint arXiv:2503.06553}, year={2025} } ```

# ProJudge: 面向基于多模态大语言模型(Multimodal Large Language Model, MLLM)的流程评判器的多模态多学科基准与指令微调数据集 **ProJudge-173k** 是首个专为流程评估任务设计的大规模指令微调数据集,附带细粒度的步骤级标注。 该数据集具备以下特性: - 多模态:涵盖纯文本、单张图像以及多图像交错内容等多种模态; - 多学科:覆盖数学、物理、化学、生物四大科学学科; - 多难度:包含从小学阶段至竞赛级别的多样化难度层级。 # 数据加载示例 python # 加载完整数据集: from datasets import load_dataset dataset=load_dataset("julyai/ProJudge-173k", split="train") print(dataset[0]) # 加载不同子集: camel_dataset=load_dataset("julyai/ProJudge-173k/Camel-AI", split="train") print(camel_dataset[0]) k12_dataset=load_dataset("julyai/ProJudge-173k/k12", split="train") print(k12_dataset[0]) OlympiadBench_dataset=load_dataset("julyai/ProJudge-173k/OlympiadBench", split="train") print(OlympiadBench_dataset[0]) 有关数据加载与使用的更多细节,请参阅我们的[GitHub页面](https://github.com/jiaxin-ai/ProJudge)。 若您认为本代码对您有帮助,或使用了本基准数据集,请引用我们的论文。 @article{ai2025projudge, title={ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges}, author={Jiaxin Ai and Pengfei Zhou and Zhaopan Xu and Ming Li and Fanrui Zhang and Zizhen Li and Jianwen Sun and Yukang Feng and Baojin Huang and Zhongyuan Wang and Kaipeng Zhang}, journal={arXiv preprint arXiv:2503.06553}, year={2025} }
提供机构:
maas
创建时间:
2025-06-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作