five

happy8825/MMLongBench_baseline_reproduce_seed5

收藏
Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/happy8825/MMLongBench_baseline_reproduce_seed5
下载链接
链接失效反馈
官方服务:
资源简介:
<!-- SIMPLEDOC_AUTO_SUMMARIES_START --> ### MMLongBench – 2025-12-14 11:33 UTC ``` Average accuracy: 48.69% (1072 samples with scores) Subset metrics by evidence source: Pure-text (Plain-text): samples=302, accuracy=44.70% Figure: samples=299, accuracy=38.80% Table: samples=217, accuracy=41.47% Chart: samples=175, accuracy=34.86% Generalized-text (Layout): samples=119, accuracy=31.93% Subset metrics by evidence pages length: no_pages: samples=226, accuracy=69.91% single_page: samples=489, accuracy=52.15% multiple_pages: samples=357, accuracy=30.53% Done: Results saved to /hub_data2/seohyun/outputs/baseline_reproduce_seed5/simpledoc_eval/MMLongBench/eval_results.jsonl Results source: /hub_data2/seohyun/outputs/baseline_reproduce_seed5/results.json ``` --- ### MMLongBench – 2025-12-11 23:42 UTC ``` Average accuracy: 48.69% (1072 samples with scores) Subset metrics by evidence source: Pure-text (Plain-text): samples=302, accuracy=44.70% Figure: samples=299, accuracy=38.80% Table: samples=217, accuracy=41.47% Chart: samples=175, accuracy=34.86% Generalized-text (Layout): samples=119, accuracy=31.93% Subset metrics by evidence pages length: no_pages: samples=226, accuracy=69.91% single_page: samples=489, accuracy=52.15% multiple_pages: samples=357, accuracy=30.53% Done: Results saved to /hub_data2/seohyun/outputs/baseline_reproduce_seed5/simpledoc_eval/MMLongBench/eval_results.jsonl Results source: /hub_data2/seohyun/outputs/baseline_reproduce_seed5/results.json ``` <!-- SIMPLEDOC_AUTO_SUMMARIES_END --> --- configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: relevant_pages list: int64 - name: evidence_pages list: int64 - name: score dtype: int64 - name: doc_id dtype: string - name: doc_type dtype: string - name: question dtype: string - name: answer dtype: string - name: evidence_sources list: string - name: final_answer dtype: string - name: turn1_colqwen_query dtype: string - name: turn1_colqwen_retrieval_results struct: - name: top_pages list: int64 - name: top_pages_with_scores list: - name: page dtype: int64 - name: score dtype: float64 - name: turn1_llm_query_input dtype: string - name: turn1_llm_retrieval_results struct: - name: document_summary dtype: string - name: relevant_pages list: int64 - name: turn1_llm_raw_output dtype: string - name: turn1_memory_out dtype: string - name: turn2_memory_in dtype: string - name: turn2_vlm_prompt_input dtype: string - name: turn2_vlm_raw_output dtype: string - name: turn2_final_answer dtype: string - name: turn2_response_type dtype: string - name: turn2_updated_question dtype: string - name: turn2_notes dtype: string - name: turn2_vlm_turn1_input_image0_ref dtype: string - name: turn2_vlm_turn1_input_image10_ref dtype: string - name: turn2_vlm_turn1_input_image11_ref dtype: string - name: turn2_vlm_turn1_input_image12_ref dtype: string - name: turn2_vlm_turn1_input_image13_ref dtype: string - name: turn2_vlm_turn1_input_image14_ref dtype: string - name: turn2_vlm_turn1_input_image15_ref dtype: string - name: turn2_vlm_turn1_input_image16_ref dtype: string - name: turn2_vlm_turn1_input_image17_ref dtype: string - name: turn2_vlm_turn1_input_image18_ref dtype: string - name: turn2_vlm_turn1_input_image19_ref dtype: string - name: turn2_vlm_turn1_input_image1_ref dtype: string - name: turn2_vlm_turn1_input_image20_ref dtype: string - name: turn2_vlm_turn1_input_image21_ref dtype: string - name: turn2_vlm_turn1_input_image22_ref dtype: string - name: turn2_vlm_turn1_input_image23_ref dtype: string - name: turn2_vlm_turn1_input_image24_ref dtype: string - name: turn2_vlm_turn1_input_image2_ref dtype: string - name: turn2_vlm_turn1_input_image3_ref dtype: string - name: turn2_vlm_turn1_input_image4_ref dtype: string - name: turn2_vlm_turn1_input_image5_ref dtype: string - name: turn2_vlm_turn1_input_image6_ref dtype: string - name: turn2_vlm_turn1_input_image7_ref dtype: string - name: turn2_vlm_turn1_input_image8_ref dtype: string - name: turn2_vlm_turn1_input_image9_ref dtype: string - name: turn2_vlm_turn1_input_messages list: 'null' - name: turn2_vlm_turn1_prompt dtype: string - name: turn2_vlm_turn1_raw_output dtype: string - name: turn3_colqwen_query dtype: string - name: turn3_colqwen_retrieval_results struct: - name: top_pages list: int64 - name: top_pages_with_scores list: - name: page dtype: int64 - name: score dtype: float64 - name: turn3_llm_query_input dtype: string - name: turn3_llm_retrieval_results struct: - name: document_summary dtype: string - name: relevant_pages list: int64 - name: turn3_llm_raw_output dtype: string - name: turn3_memory_out dtype: string - name: turn4_memory_in dtype: string - name: turn4_vlm_prompt_input dtype: string - name: turn4_vlm_raw_output dtype: string - name: turn4_final_answer dtype: string - name: turn4_response_type dtype: string - name: turn4_updated_question dtype: 'null' - name: turn4_notes dtype: 'null' - name: turn4_vlm_turn1_input_image0_ref dtype: string - name: turn4_vlm_turn1_input_image10_ref dtype: string - name: turn4_vlm_turn1_input_image11_ref dtype: string - name: turn4_vlm_turn1_input_image12_ref dtype: string - name: turn4_vlm_turn1_input_image13_ref dtype: string - name: turn4_vlm_turn1_input_image14_ref dtype: string - name: turn4_vlm_turn1_input_image15_ref dtype: string - name: turn4_vlm_turn1_input_image16_ref dtype: string - name: turn4_vlm_turn1_input_image17_ref dtype: string - name: turn4_vlm_turn1_input_image1_ref dtype: string - name: turn4_vlm_turn1_input_image2_ref dtype: string - name: turn4_vlm_turn1_input_image3_ref dtype: string - name: turn4_vlm_turn1_input_image4_ref dtype: string - name: turn4_vlm_turn1_input_image5_ref dtype: string - name: turn4_vlm_turn1_input_image6_ref dtype: string - name: turn4_vlm_turn1_input_image7_ref dtype: string - name: turn4_vlm_turn1_input_image8_ref dtype: string - name: turn4_vlm_turn1_input_image9_ref dtype: string - name: turn4_vlm_turn1_input_messages list: 'null' - name: turn4_vlm_turn1_prompt dtype: string - name: turn4_vlm_turn1_raw_output dtype: string splits: - name: train num_bytes: 55503978 num_examples: 1073 download_size: 15080292 dataset_size: 55503978 --- --- dataset_info: features: - name: relevant_pages list: int64 - name: evidence_pages list: int64 - name: score dtype: int64 - name: doc_id dtype: string - name: doc_type dtype: string - name: question dtype: string - name: answer dtype: string - name: evidence_sources list: string - name: final_answer dtype: string - name: turn1_colqwen_query dtype: string - name: turn1_colqwen_retrieval_results struct: - name: top_pages list: int64 - name: top_pages_with_scores list: - name: page dtype: int64 - name: score dtype: float64 - name: turn1_llm_query_input dtype: string - name: turn1_llm_retrieval_results struct: - name: document_summary dtype: string - name: relevant_pages list: int64 - name: turn1_llm_raw_output dtype: string - name: turn1_memory_out dtype: string - name: turn2_memory_in dtype: string - name: turn2_vlm_prompt_input dtype: string - name: turn2_vlm_raw_output dtype: string - name: turn2_final_answer dtype: string - name: turn2_response_type dtype: string - name: turn2_updated_question dtype: string - name: turn2_notes dtype: string - name: turn2_vlm_turn1_input_image0_ref dtype: string - name: turn2_vlm_turn1_input_image10_ref dtype: string - name: turn2_vlm_turn1_input_image11_ref dtype: string - name: turn2_vlm_turn1_input_image12_ref dtype: string - name: turn2_vlm_turn1_input_image13_ref dtype: string - name: turn2_vlm_turn1_input_image14_ref dtype: string - name: turn2_vlm_turn1_input_image15_ref dtype: string - name: turn2_vlm_turn1_input_image16_ref dtype: string - name: turn2_vlm_turn1_input_image17_ref dtype: string - name: turn2_vlm_turn1_input_image18_ref dtype: string - name: turn2_vlm_turn1_input_image19_ref dtype: string - name: turn2_vlm_turn1_input_image1_ref dtype: string - name: turn2_vlm_turn1_input_image20_ref dtype: string - name: turn2_vlm_turn1_input_image21_ref dtype: string - name: turn2_vlm_turn1_input_image22_ref dtype: string - name: turn2_vlm_turn1_input_image23_ref dtype: string - name: turn2_vlm_turn1_input_image24_ref dtype: string - name: turn2_vlm_turn1_input_image2_ref dtype: string - name: turn2_vlm_turn1_input_image3_ref dtype: string - name: turn2_vlm_turn1_input_image4_ref dtype: string - name: turn2_vlm_turn1_input_image5_ref dtype: string - name: turn2_vlm_turn1_input_image6_ref dtype: string - name: turn2_vlm_turn1_input_image7_ref dtype: string - name: turn2_vlm_turn1_input_image8_ref dtype: string - name: turn2_vlm_turn1_input_image9_ref dtype: string - name: turn2_vlm_turn1_input_messages list: 'null' - name: turn2_vlm_turn1_prompt dtype: string - name: turn2_vlm_turn1_raw_output dtype: string - name: turn3_colqwen_query dtype: string - name: turn3_colqwen_retrieval_results struct: - name: top_pages list: int64 - name: top_pages_with_scores list: - name: page dtype: int64 - name: score dtype: float64 - name: turn3_llm_query_input dtype: string - name: turn3_llm_retrieval_results struct: - name: document_summary dtype: string - name: relevant_pages list: int64 - name: turn3_llm_raw_output dtype: string - name: turn3_memory_out dtype: string - name: turn4_memory_in dtype: string - name: turn4_vlm_prompt_input dtype: string - name: turn4_vlm_raw_output dtype: string - name: turn4_final_answer dtype: string - name: turn4_response_type dtype: string - name: turn4_updated_question dtype: 'null' - name: turn4_notes dtype: 'null' - name: turn4_vlm_turn1_input_image0_ref dtype: string - name: turn4_vlm_turn1_input_image10_ref dtype: string - name: turn4_vlm_turn1_input_image11_ref dtype: string - name: turn4_vlm_turn1_input_image12_ref dtype: string - name: turn4_vlm_turn1_input_image13_ref dtype: string - name: turn4_vlm_turn1_input_image14_ref dtype: string - name: turn4_vlm_turn1_input_image15_ref dtype: string - name: turn4_vlm_turn1_input_image16_ref dtype: string - name: turn4_vlm_turn1_input_image17_ref dtype: string - name: turn4_vlm_turn1_input_image1_ref dtype: string - name: turn4_vlm_turn1_input_image2_ref dtype: string - name: turn4_vlm_turn1_input_image3_ref dtype: string - name: turn4_vlm_turn1_input_image4_ref dtype: string - name: turn4_vlm_turn1_input_image5_ref dtype: string - name: turn4_vlm_turn1_input_image6_ref dtype: string - name: turn4_vlm_turn1_input_image7_ref dtype: string - name: turn4_vlm_turn1_input_image8_ref dtype: string - name: turn4_vlm_turn1_input_image9_ref dtype: string - name: turn4_vlm_turn1_input_messages list: 'null' - name: turn4_vlm_turn1_prompt dtype: string - name: turn4_vlm_turn1_raw_output dtype: string splits: - name: train num_bytes: 55503978 num_examples: 1073 download_size: 15080292 dataset_size: 55503978 configs: - config_name: default data_files: - split: train path: data/train-* ---
提供机构:
happy8825
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作