five

xiaofff/omnievalkit-data-test

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/xiaofff/omnievalkit-data-test
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - audio-classification - automatic-speech-recognition - visual-question-answering - video-text-to-text language: - en - zh tags: - omni-modal - evaluation - benchmark size_categories: - 100K<n<1M pretty_name: OmniEvalKit Data dataset_info: - config_name: aishell1_test features: - name: name dtype: string - name: WavPath dtype: string - name: text dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 1136965760 num_examples: 7176 - config_name: aishell2_test features: - name: WavPath dtype: string - name: text dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 452586858 num_examples: 5000 - config_name: audio_trivia_qa features: - name: answer dtype: string - name: question dtype: string - name: question_id dtype: string - name: save_name dtype: string - name: WavPath dtype: string - name: id dtype: int64 - name: wer% dtype: int64 - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 337519271 num_examples: 1024 - config_name: audio_web_questions features: - name: url dtype: string - name: question dtype: string - name: answers dtype: string - name: save_name dtype: string - name: WavPath dtype: string - name: error dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 232278818 num_examples: 2032 - config_name: audiocaps_test features: - name: audiocap_id dtype: int64 - name: youtube_id dtype: string - name: start_time dtype: int64 - name: caption dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 149505076933 num_examples: 3985 - config_name: av_odyssey features: - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question_id dtype: string - name: question_type_id dtype: string - name: data_type dtype: string - name: subfield dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: image_paths_dict dtype: string - name: audio_paths_dict dtype: string - name: video_paths_dict dtype: string - name: video_duration dtype: float64 - name: audio_bytes_dict dtype: string - name: image_bytes_dict dtype: string splits: - name: test num_bytes: 33545681273 num_examples: 4555 - config_name: avmeme_full features: - name: dataset_type dtype: string - name: dataset_name dtype: string - name: sample_id dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: category dtype: string - name: question_type dtype: string - name: language dtype: string - name: original_date dtype: string - name: emotion dtype: string - name: sensitivity dtype: string - name: visual_hint dtype: string - name: visual_cheat dtype: bool - name: name dtype: string - name: summary dtype: string - name: usage dtype: string - name: VideoPath dtype: string - name: WavPath dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 407920890 num_examples: 1032 - config_name: avmeme_main features: - name: dataset_type dtype: string - name: dataset_name dtype: string - name: sample_id dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: category dtype: string - name: question_type dtype: string - name: language dtype: string - name: original_date dtype: string - name: emotion dtype: string - name: sensitivity dtype: string - name: visual_hint dtype: string - name: visual_cheat dtype: bool - name: name dtype: string - name: summary dtype: string - name: usage dtype: string - name: VideoPath dtype: string - name: WavPath dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 336767213 num_examples: 846 - config_name: avut_benchmark_gemini features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task_type dtype: string - name: video_type dtype: string - name: video_id dtype: int64 - name: qa_id dtype: int64 - name: source dtype: string - name: url dtype: string - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 20746411298 num_examples: 9874 - config_name: avut_benchmark_human features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task_type dtype: string - name: video_type dtype: string - name: video_id dtype: int64 - name: qa_id dtype: int64 - name: source dtype: string - name: url dtype: string - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 1507383196 num_examples: 1734 - config_name: clothocaption_test features: - name: file_name dtype: string - name: caption_1 dtype: string - name: caption_2 dtype: string - name: caption_3 dtype: string - name: caption_4 dtype: string - name: caption_5 dtype: string - name: WavPath dtype: string - name: answer dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 1975652463 num_examples: 1045 - config_name: commonvoice_en features: - name: client_id dtype: string - name: path dtype: string - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accents dtype: string - name: variant dtype: string - name: locale dtype: string - name: segment dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 696464627 num_examples: 16386 - config_name: commonvoice_fr features: - name: client_id dtype: string - name: path dtype: string - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accents dtype: string - name: variant dtype: string - name: locale dtype: string - name: segment dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 685180590 num_examples: 16132 - config_name: commonvoice_yue features: - name: client_id dtype: string - name: path dtype: string - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accents dtype: string - name: variant dtype: string - name: locale dtype: string - name: segment dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 227582144 num_examples: 5593 - config_name: commonvoice_zh features: - name: client_id dtype: string - name: path dtype: string - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accents dtype: string - name: variant dtype: string - name: locale dtype: string - name: segment dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 493187729 num_examples: 10625 - config_name: covost2_en_zh features: - name: audio dtype: string - name: text dtype: string - name: split dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 704625980 num_examples: 15530 - config_name: covost2_zh_en features: - name: audio dtype: string - name: text dtype: string - name: split dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 236430218 num_examples: 4897 - config_name: daily_omni features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: explanation dtype: string - name: video_id dtype: string - name: qa_type dtype: string - name: content_parent_category dtype: string - name: content_fine_category dtype: string - name: video_category dtype: string - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 1013263841 num_examples: 1197 - config_name: fleurs_en features: - name: name dtype: string - name: WavPath dtype: string - name: text dtype: string - name: id dtype: int64 - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 401974485 num_examples: 647 - config_name: fleurs_zh features: - name: name dtype: string - name: WavPath dtype: string - name: text dtype: string - name: id dtype: int64 - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 696186036 num_examples: 945 - config_name: futureomni features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: sample_id dtype: int64 - name: original_video dtype: string - name: split_point dtype: int64 - name: video_domain dtype: string - name: audio_type dtype: string - name: forecasting_pattern dtype: string - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 5112719517 num_examples: 1034 - config_name: gigaspeech_test features: - name: sid dtype: string - name: speaker dtype: string - name: text_tn dtype: string - name: begin_time dtype: float64 - name: end_time dtype: float64 - name: title dtype: string - name: url dtype: string - name: path dtype: string - name: aid dtype: string - name: source dtype: string - name: codec dtype: string - name: channels dtype: int64 - name: md5 dtype: string - name: speaker.1 dtype: string - name: category dtype: string - name: text dtype: string - name: WavPath dtype: string splits: - name: test num_bytes: 3851653 num_examples: 19870 - config_name: jointavbench features: - name: qid dtype: string - name: VideoPath dtype: string - name: WavPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: task dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: explanation dtype: string - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 17660930149 num_examples: 2853 - config_name: kespeech_test features: - name: ID dtype: string - name: text dtype: string - name: Dialect dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 3450929588 num_examples: 19723 - config_name: librispeech_dev_clean features: - name: audio dtype: string - name: gt dtype: string - name: source dtype: string - name: text dtype: string - name: WavPath dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 341928446 num_examples: 2703 - config_name: librispeech_dev_other features: - name: audio dtype: string - name: gt dtype: string - name: source dtype: string - name: text dtype: string - name: WavPath dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 318468791 num_examples: 2864 - config_name: librispeech_test_clean features: - name: audio dtype: string - name: gt dtype: string - name: source dtype: string - name: text dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 350316656 num_examples: 2620 - config_name: librispeech_test_other features: - name: audio dtype: string - name: gt dtype: string - name: source dtype: string - name: text dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 333102761 num_examples: 2939 - config_name: livesports3k_cc features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_1fps_10 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_1fps_10_prompt features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_1fps_10_uniform features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_1fps_10_uniform_prompt features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_1fps_40 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_1fps_90 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_2fps_11 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_3fps_12 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_5fps_14 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_5fps_14_uniform features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_8fps_44 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_8fps_44_prompt features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_prompt features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 2932050 num_examples: 1702 - config_name: livesports3k_cc_under50s_1fps_10 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_1fps_10_fixed features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_1fps_10_prompt features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_1fps_40 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_1fps_90 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_2fps_11 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_5fps_14 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_5fps_14_fixed features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_8fps_44 features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: livesports3k_cc_under50s_8fps_44_prompt features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: event_title dtype: string - name: event_asr_text dtype: string - name: preasr_text dtype: string - name: video_id dtype: string - name: url dtype: string - name: event_id dtype: int64 - name: begin dtype: float64 - name: end dtype: float64 - name: event_type dtype: int64 - name: class dtype: string - name: event_asr dtype: string - name: preasr dtype: string splits: - name: test num_bytes: 1979130 num_examples: 1310 - config_name: meld features: - name: Sr No. dtype: int64 - name: Utterance dtype: string - name: Speaker dtype: string - name: Emotion dtype: string - name: Sentiment dtype: string - name: Dialogue_ID dtype: int64 - name: Utterance_ID dtype: int64 - name: Season dtype: int64 - name: Episode dtype: int64 - name: StartTime dtype: string - name: EndTime dtype: string - name: video_path dtype: string - name: WavPath dtype: string - name: gt dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 1529972866 num_examples: 2610 - config_name: mmar_bench features: - name: name dtype: string - name: id dtype: string - name: WavPath dtype: string - name: question dtype: string - name: answer dtype: string - name: duration dtype: float64 - name: choices dtype: string - name: modality dtype: string - name: category dtype: string - name: sub_category dtype: string - name: language dtype: string - name: source dtype: string - name: url dtype: string - name: timestamp dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 3583995753 num_examples: 1000 - config_name: mmau_test_mini features: - name: name dtype: string - name: id dtype: string - name: WavPath dtype: string - name: question dtype: string - name: answer dtype: string - name: duration dtype: float64 - name: choices dtype: string - name: dataset dtype: string - name: task dtype: string - name: split dtype: string - name: category dtype: string - name: sub_category dtype: string - name: difficulty dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 1487414290 num_examples: 1000 - config_name: mmsu_bench features: - name: name dtype: string - name: id dtype: string - name: WavPath dtype: string - name: question dtype: string - name: answer dtype: string - name: choices dtype: string - name: duration dtype: float64 - name: task_name dtype: string - name: category dtype: string - name: sub_category dtype: string - name: sub_sub_category dtype: string - name: linguistics_sub_discipline dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 1381462133 num_examples: 4996 - config_name: omni_openqa features: - name: sample_id dtype: int64 - name: VideoPath dtype: string - name: WavPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: gt_answer dtype: string - name: original_source dtype: string - name: duration dtype: float64 - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 1191141668 num_examples: 425 - config_name: omnibench features: - name: WavPath dtype: string - name: ImagePath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task_type dtype: string - name: audio_type dtype: string - name: audio_content dtype: string - name: image_content dtype: string - name: index dtype: int64 - name: video_duration dtype: string - name: audio dtype: audio - name: image dtype: image splits: - name: test num_bytes: 1263000056 num_examples: 1142 - config_name: ovavel features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: sample_id dtype: string - name: gt_label dtype: string - name: event_category dtype: string - name: cls_type dtype: string - name: split dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 3589494663 num_examples: 5818 - config_name: ovobench features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: ImagePath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task dtype: string - name: task_group dtype: string - name: realtime dtype: float64 - name: sample_id dtype: int64 - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 2983728818 num_examples: 1468 - config_name: peoples_speech_test features: - name: duration_ms dtype: int64 - name: label dtype: string - name: name dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 4087774844 num_examples: 34898 - config_name: streamingbench_omni_fix features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: ImagePath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task_type dtype: string - name: required_ability dtype: string - name: video_categories dtype: string - name: time_range dtype: string - name: time_stamp dtype: string - name: sample_id dtype: int64 - name: video_duration dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 1798910721 num_examples: 1000 - config_name: streamingbench_real features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: ImagePath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task_type dtype: string - name: required_ability dtype: string - name: video_categories dtype: string - name: time_range dtype: string - name: time_stamp dtype: string - name: sample_id dtype: int64 - name: video_duration dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 3932413717 num_examples: 2499 - config_name: streamingbench_sqa features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: ImagePath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task_type dtype: string - name: required_ability dtype: string - name: video_categories dtype: string - name: time_range dtype: string - name: time_stamp dtype: string - name: sample_id dtype: int64 - name: sqa_context dtype: string - name: video_duration dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 426916203 num_examples: 250 - config_name: tedlium3_test features: - name: WavPath dtype: string - name: text dtype: string - name: speaker_id dtype: string - name: gender dtype: string - name: file dtype: string - name: id dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 828876996 num_examples: 3737 - config_name: unobench features: - name: dataset_type dtype: string - name: dataset_name dtype: string - name: qid dtype: int64 - name: question dtype: string - name: gt_answer dtype: string - name: subset_name dtype: string - name: ability dtype: string - name: task dtype: string - name: source dtype: string - name: audio_type dtype: string - name: score_type dtype: int64 - name: audio_paths_dict dtype: string - name: video_duration dtype: float64 - name: audio_bytes_dict dtype: string - name: image_paths_dict dtype: string - name: audio_caption dtype: string - name: image_caption dtype: string - name: image_bytes_dict dtype: string - name: video_paths_dict dtype: string - name: video_caption dtype: string splits: - name: test num_bytes: 8350862746 num_examples: 3730 - config_name: unobench_mc features: - name: dataset_type dtype: string - name: dataset_name dtype: string - name: qid dtype: int64 - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: subset_name dtype: string - name: ability dtype: string - name: task dtype: string - name: source dtype: string - name: audio_type dtype: string - name: audio_paths_dict dtype: string - name: image_paths_dict dtype: string - name: video_duration dtype: float64 - name: audio_bytes_dict dtype: string - name: image_bytes_dict dtype: string - name: video_paths_dict dtype: string splits: - name: test num_bytes: 3906205826 num_examples: 1000 - config_name: video_holmes features: - name: VideoPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: video_id dtype: string - name: question_id dtype: int64 - name: question_type dtype: string - name: explanation dtype: string - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 3502030631 num_examples: 1837 - config_name: videomme features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: ImagePath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task_type dtype: string - name: duration dtype: string - name: domain dtype: string - name: sub_category dtype: string - name: video_id dtype: string - name: url dtype: string - name: raw_options dtype: string - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 57682096151 num_examples: 2700 - config_name: videomme_short features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: ImagePath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: task_type dtype: string - name: duration dtype: string - name: domain dtype: string - name: sub_category dtype: string - name: video_id dtype: string - name: url dtype: string - name: raw_options dtype: string - name: video_duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 860997438 num_examples: 900 - config_name: vocalsound features: - name: WavPath dtype: string - name: name dtype: string - name: text dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 1107997569 num_examples: 3591 - config_name: voice_cmmlu features: - name: Question dtype: string - name: A dtype: string - name: B dtype: string - name: C dtype: string - name: D dtype: string - name: Answer dtype: string - name: question dtype: string - name: choices dtype: string - name: question_text dtype: string - name: WavPath dtype: string - name: question_WavPath dtype: string - name: choice_WavPath dtype: string - name: prompt_prefix_WavPath dtype: string - name: subject dtype: string - name: subject_zh dtype: string - name: id dtype: int64 - name: question_WavPath_asr dtype: string - name: choice_WavPath_asr dtype: string - name: asr-wer dtype: string - name: clean dtype: bool - name: duration dtype: float64 - name: audio_bytes dtype: binary splits: - name: test num_bytes: 14245704762 num_examples: 11582 - config_name: voicebench_advbench features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string splits: - name: test num_bytes: 25938 num_examples: 520 - config_name: voicebench_alpacaeval features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string splits: - name: test num_bytes: 15797 num_examples: 199 - config_name: voicebench_alpacaeval_full features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string splits: - name: test num_bytes: 55020 num_examples: 636 - config_name: voicebench_bbh features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string - name: reference dtype: string - name: id dtype: string splits: - name: test num_bytes: 76851 num_examples: 1000 - config_name: voicebench_commoneval features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string splits: - name: test num_bytes: 12522 num_examples: 200 - config_name: voicebench_ifeval features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string - name: key dtype: int64 - name: instruction_id_list dtype: string - name: kwargs dtype: string splits: - name: test num_bytes: 68545 num_examples: 345 - config_name: voicebench_mmsu features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string - name: reference dtype: string splits: - name: test num_bytes: 445272 num_examples: 3074 - config_name: voicebench_openbookqa features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string - name: reference dtype: string splits: - name: test num_bytes: 53620 num_examples: 455 - config_name: voicebench_sdqa features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string - name: reference dtype: string splits: - name: test num_bytes: 270125 num_examples: 6083 - config_name: voicebench_wildvoice features: - name: name dtype: string - name: WavPath dtype: string - name: prompt dtype: string - name: conversation_hash dtype: string splits: - name: test num_bytes: 131410 num_examples: 1000 - config_name: voxpopuli_en features: - name: id dtype: string - name: raw_text dtype: string - name: normalized_text dtype: string - name: speaker_id dtype: float64 - name: split dtype: string - name: gender dtype: string - name: is_gold_transcript dtype: bool - name: accent dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 942842121 num_examples: 1842 - config_name: wavcaps_audioset_sl features: - name: id dtype: string - name: caption dtype: string - name: audio dtype: string - name: duration dtype: float64 - name: WavPath dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 3921918450 num_examples: 11676 - config_name: wavcaps_freesound features: - name: id dtype: string - name: file_name dtype: string - name: href dtype: string - name: tags dtype: string - name: description dtype: string - name: author dtype: string - name: duration dtype: float64 - name: download_link dtype: string - name: caption dtype: string - name: audio dtype: string - name: WavPath dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 2914825277 num_examples: 1060 - config_name: wavcaps_soundbible features: - name: title dtype: string - name: description dtype: string - name: author dtype: string - name: href dtype: string - name: caption dtype: string - name: id dtype: string - name: duration dtype: float64 - name: audio dtype: string - name: download_link dtype: string - name: WavPath dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 556140073 num_examples: 1232 - config_name: wenetspeech_test_meeting features: - name: sid dtype: string - name: confidence dtype: int64 - name: begin_time dtype: int64 - name: end_time dtype: float64 - name: subsets dtype: string - name: text dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 395817022 num_examples: 8370 - config_name: wenetspeech_test_net features: - name: sid dtype: string - name: confidence dtype: int64 - name: begin_time dtype: float64 - name: end_time dtype: float64 - name: subsets dtype: string - name: text dtype: string - name: WavPath dtype: string - name: duration dtype: float64 - name: audio dtype: audio splits: - name: test num_bytes: 715273827 num_examples: 24774 - config_name: worldsense features: - name: VideoPath dtype: string - name: WavPath dtype: string - name: dataset_type dtype: string - name: dataset_name dtype: string - name: question dtype: string - name: choices dtype: string - name: gt_answer dtype: string - name: video_id dtype: string - name: task_id dtype: string - name: task_domain dtype: string - name: task_type dtype: string - name: domain dtype: string - name: sub_category dtype: string - name: video_duration dtype: float64 - name: duration dtype: string - name: audio_class dtype: string - name: video_caption dtype: string - name: audio dtype: audio splits: - name: test num_bytes: 8139002233 num_examples: 3172 configs: - config_name: aishell1_test default: true data_files: - split: test path: aishell1_test/test-* - config_name: aishell2_test data_files: - split: test path: aishell2_test/test-* - config_name: audio_trivia_qa data_files: - split: test path: audio_trivia_qa/test-* - config_name: audio_web_questions data_files: - split: test path: audio_web_questions/test-* - config_name: audiocaps_test data_files: - split: test path: audiocaps_test/test-* - config_name: av_odyssey data_files: - split: test path: av_odyssey/test-* - config_name: avmeme_full data_files: - split: test path: avmeme_full/test-* - config_name: avmeme_main data_files: - split: test path: avmeme_main/test-* - config_name: avut_benchmark_gemini data_files: - split: test path: avut_benchmark_gemini/test-* - config_name: avut_benchmark_human data_files: - split: test path: avut_benchmark_human/test-* - config_name: clothocaption_test data_files: - split: test path: clothocaption_test/test-* - config_name: commonvoice_en data_files: - split: test path: commonvoice_en/test-* - config_name: commonvoice_fr data_files: - split: test path: commonvoice_fr/test-* - config_name: commonvoice_yue data_files: - split: test path: commonvoice_yue/test-* - config_name: commonvoice_zh data_files: - split: test path: commonvoice_zh/test-* - config_name: covost2_en_zh data_files: - split: test path: covost2_en_zh/test-* - config_name: covost2_zh_en data_files: - split: test path: covost2_zh_en/test-* - config_name: daily_omni data_files: - split: test path: daily_omni/test-* - config_name: fleurs_en data_files: - split: test path: fleurs_en/test-* - config_name: fleurs_zh data_files: - split: test path: fleurs_zh/test-* - config_name: futureomni data_files: - split: test path: futureomni/test-* - config_name: gigaspeech_test data_files: - split: test path: gigaspeech_test/test-* - config_name: jointavbench data_files: - split: test path: jointavbench/test-* - config_name: kespeech_test data_files: - split: test path: kespeech_test/test-* - config_name: librispeech_dev_clean data_files: - split: test path: librispeech_dev_clean/test-* - config_name: librispeech_dev_other data_files: - split: test path: librispeech_dev_other/test-* - config_name: librispeech_test_clean data_files: - split: test path: librispeech_test_clean/test-* - config_name: librispeech_test_other data_files: - split: test path: librispeech_test_other/test-* - config_name: livesports3k_cc data_files: - split: test path: livesports3k_cc/test-* - config_name: livesports3k_cc_1fps_10 data_files: - split: test path: livesports3k_cc_1fps_10/test-* - config_name: livesports3k_cc_1fps_10_prompt data_files: - split: test path: livesports3k_cc_1fps_10_prompt/test-* - config_name: livesports3k_cc_1fps_10_uniform data_files: - split: test path: livesports3k_cc_1fps_10_uniform/test-* - config_name: livesports3k_cc_1fps_10_uniform_prompt data_files: - split: test path: livesports3k_cc_1fps_10_uniform_prompt/test-* - config_name: livesports3k_cc_1fps_40 data_files: - split: test path: livesports3k_cc_1fps_40/test-* - config_name: livesports3k_cc_1fps_90 data_files: - split: test path: livesports3k_cc_1fps_90/test-* - config_name: livesports3k_cc_2fps_11 data_files: - split: test path: livesports3k_cc_2fps_11/test-* - config_name: livesports3k_cc_3fps_12 data_files: - split: test path: livesports3k_cc_3fps_12/test-* - config_name: livesports3k_cc_5fps_14 data_files: - split: test path: livesports3k_cc_5fps_14/test-* - config_name: livesports3k_cc_5fps_14_uniform data_files: - split: test path: livesports3k_cc_5fps_14_uniform/test-* - config_name: livesports3k_cc_8fps_44 data_files: - split: test path: livesports3k_cc_8fps_44/test-* - config_name: livesports3k_cc_8fps_44_prompt data_files: - split: test path: livesports3k_cc_8fps_44_prompt/test-* - config_name: livesports3k_cc_prompt data_files: - split: test path: livesports3k_cc_prompt/test-* - config_name: livesports3k_cc_under50s_1fps_10 data_files: - split: test path: livesports3k_cc_under50s_1fps_10/test-* - config_name: livesports3k_cc_under50s_1fps_10_fixed data_files: - split: test path: livesports3k_cc_under50s_1fps_10_fixed/test-* - config_name: livesports3k_cc_under50s_1fps_10_prompt data_files: - split: test path: livesports3k_cc_under50s_1fps_10_prompt/test-* - config_name: livesports3k_cc_under50s_1fps_40 data_files: - split: test path: livesports3k_cc_under50s_1fps_40/test-* - config_name: livesports3k_cc_under50s_1fps_90 data_files: - split: test path: livesports3k_cc_under50s_1fps_90/test-* - config_name: livesports3k_cc_under50s_2fps_11 data_files: - split: test path: livesports3k_cc_under50s_2fps_11/test-* - config_name: livesports3k_cc_under50s_5fps_14 data_files: - split: test path: livesports3k_cc_under50s_5fps_14/test-* - config_name: livesports3k_cc_under50s_5fps_14_fixed data_files: - split: test path: livesports3k_cc_under50s_5fps_14_fixed/test-* - config_name: livesports3k_cc_under50s_8fps_44 data_files: - split: test path: livesports3k_cc_under50s_8fps_44/test-* - config_name: livesports3k_cc_under50s_8fps_44_prompt data_files: - split: test path: livesports3k_cc_under50s_8fps_44_prompt/test-* - config_name: meld data_files: - split: test path: meld/test-* - config_name: mmar_bench data_files: - split: test path: mmar_bench/test-* - config_name: mmau_test_mini data_files: - split: test path: mmau_test_mini/test-* - config_name: mmsu_bench data_files: - split: test path: mmsu_bench/test-* - config_name: omni_openqa data_files: - split: test path: omni_openqa/test-* - config_name: omnibench data_files: - split: test path: omnibench/test-* - config_name: ovavel data_files: - split: test path: ovavel/test-* - config_name: ovobench data_files: - split: test path: ovobench/test-* - config_name: peoples_speech_test data_files: - split: test path: peoples_speech_test/test-* - config_name: streamingbench_omni_fix data_files: - split: test path: streamingbench_omni_fix/test-* - config_name: streamingbench_real data_files: - split: test path: streamingbench_real/test-* - config_name: streamingbench_sqa data_files: - split: test path: streamingbench_sqa/test-* - config_name: tedlium3_test data_files: - split: test path: tedlium3_test/test-* - config_name: unobench data_files: - split: test path: unobench/test-* - config_name: unobench_mc data_files: - split: test path: unobench_mc/test-* - config_name: video_holmes data_files: - split: test path: video_holmes/test-* - config_name: videomme data_files: - split: test path: videomme/test-* - config_name: videomme_short data_files: - split: test path: videomme_short/test-* - config_name: vocalsound data_files: - split: test path: vocalsound/test-* - config_name: voice_cmmlu data_files: - split: test path: voice_cmmlu/test-* - config_name: voicebench_advbench data_files: - split: test path: voicebench_advbench/test-* - config_name: voicebench_alpacaeval data_files: - split: test path: voicebench_alpacaeval/test-* - config_name: voicebench_alpacaeval_full data_files: - split: test path: voicebench_alpacaeval_full/test-* - config_name: voicebench_bbh data_files: - split: test path: voicebench_bbh/test-* - config_name: voicebench_commoneval data_files: - split: test path: voicebench_commoneval/test-* - config_name: voicebench_ifeval data_files: - split: test path: voicebench_ifeval/test-* - config_name: voicebench_mmsu data_files: - split: test path: voicebench_mmsu/test-* - config_name: voicebench_openbookqa data_files: - split: test path: voicebench_openbookqa/test-* - config_name: voicebench_sdqa data_files: - split: test path: voicebench_sdqa/test-* - config_name: voicebench_wildvoice data_files: - split: test path: voicebench_wildvoice/test-* - config_name: voxpopuli_en data_files: - split: test path: voxpopuli_en/test-* - config_name: wavcaps_audioset_sl data_files: - split: test path: wavcaps_audioset_sl/test-* - config_name: wavcaps_freesound data_files: - split: test path: wavcaps_freesound/test-* - config_name: wavcaps_soundbible data_files: - split: test path: wavcaps_soundbible/test-* - config_name: wenetspeech_test_meeting data_files: - split: test path: wenetspeech_test_meeting/test-* - config_name: wenetspeech_test_net data_files: - split: test path: wenetspeech_test_net/test-* - config_name: worldsense data_files: - split: test path: worldsense/test-* --- # OmniEvalKit Evaluation Datasets Evaluation datasets for [OmniEvalKit](https://github.com/openbmb/omnievalkit), a comprehensive evaluation framework for omni-modal (audio + video + image + text) models. ## Overview - **Total subsets**: 89 - **Total samples**: 353,610 - **Total size**: 352.3 GB (Parquet with embedded audio/image, no video) - **Subsets requiring video download**: 42 > **Note**: Video files are NOT embedded in the Parquet files due to size constraints. ## Usage ```python from datasets import load_dataset ds = load_dataset("xiaofff/omnievalkit-data-test", "aishell1_test") for sample in ds["test"]: print(sample) ``` ## Available Subsets ### Audio (44 subsets, 267,616 samples, 186.5GB) | Subset | Display Name | Samples | Size | Subcategory | Video | |--------|-------------|---------|------|-------------|-------| | `aishell1_test` | AISHELL-1 Test | 7,176 | 1.1GB | asr | No | | `aishell2_test` | AISHELL-2 Test | 5,000 | 432MB | asr | No | | `audio_trivia_qa` | Audio Trivia QA | 1,024 | 322MB | qa | No | | `audio_web_questions` | Audio Web Questions | 2,032 | 222MB | qa | No | | `audiocaps_test` | AudioCaps Test | 3,985 | 139.2GB | caption | No | | `clothocaption_test` | ClothoCaption Test | 1,045 | 1.8GB | caption | No | | `commonvoice_en` | CommonVoice English v15 | 16,386 | 664MB | asr | No | | `commonvoice_fr` | CommonVoice French v15 | 16,132 | 653MB | asr | No | | `commonvoice_yue` | CommonVoice Cantonese v15 | 5,593 | 217MB | asr | No | | `commonvoice_zh` | CommonVoice Chinese v15 | 10,625 | 470MB | asr | No | | `covost2_en_zh` | CoVoST2 EN-ZH | 15,530 | 672MB | ast | No | | `covost2_zh_en` | CoVoST2 ZH-EN | 4,897 | 225MB | ast | No | | `fleurs_en` | FLEURS 英文 | 647 | 383MB | asr | No | | `fleurs_zh` | FLEURS 中文 | 945 | 664MB | asr | No | | `gigaspeech_test` | GigaSpeech Test | 19,870 | 4MB | asr | No | | `kespeech_test` | KeSpeech Test | 19,723 | 3.2GB | asr | No | | `librispeech_dev_clean` | LibriSpeech Dev Clean | 2,703 | 326MB | asr | No | | `librispeech_dev_other` | LibriSpeech Dev Other | 2,864 | 304MB | asr | No | | `librispeech_test_clean` | LibriSpeech Test Clean | 2,620 | 334MB | asr | No | | `librispeech_test_other` | LibriSpeech Test Other | 2,939 | 318MB | asr | No | | `meld` | MELD | 2,610 | 1.4GB | cls | Yes | | `mmar_bench` | MMAR Bench | 1,000 | 3.3GB | qa | No | | `mmau_test_mini` | MMAU Test Mini | 1,000 | 1.4GB | qa | No | | `mmsu_bench` | MMSU Bench | 4,996 | 1.3GB | qa | No | | `peoples_speech_test` | People's Speech Test | 34,898 | 3.8GB | asr | No | | `tedlium3_test` | TED-LIUM v3 Test | 3,737 | 790MB | asr | No | | `vocalsound` | VocalSound | 3,591 | 1.0GB | cls | No | | `voice_cmmlu` | Voice CMMLU | 11,582 | 13.3GB | qa | No | | `voicebench_advbench` | VoiceBench AdvBench | 520 | 0MB | qa | No | | `voicebench_alpacaeval` | VoiceBench AlpacaEval | 199 | 0MB | qa | No | | `voicebench_alpacaeval_full` | VoiceBench AlpacaEval Full | 636 | 0MB | qa | No | | `voicebench_bbh` | VoiceBench BBH | 1,000 | 0MB | qa | No | | `voicebench_commoneval` | VoiceBench CommonEval | 200 | 0MB | qa | No | | `voicebench_ifeval` | VoiceBench IFEval | 345 | 0MB | qa | No | | `voicebench_mmsu` | VoiceBench MMSU | 3,074 | 0MB | qa | No | | `voicebench_openbookqa` | VoiceBench OpenBookQA | 455 | 0MB | qa | No | | `voicebench_sdqa` | VoiceBench SD-QA | 6,083 | 0MB | qa | No | | `voicebench_wildvoice` | VoiceBench WildVoice | 1,000 | 0MB | qa | No | | `voxpopuli_en` | VoxPopuli English | 1,842 | 899MB | asr | No | | `wavcaps_audioset_sl` | WavCaps AudioSet_SL | 11,676 | 3.7GB | caption | No | | `wavcaps_freesound` | WavCaps FreeSound | 1,060 | 2.7GB | caption | No | | `wavcaps_soundbible` | WavCaps SoundBible | 1,232 | 530MB | caption | No | | `wenetspeech_test_meeting` | WenetSpeech Test Meeting | 8,370 | 377MB | asr | No | | `wenetspeech_test_net` | WenetSpeech Test Net | 24,774 | 682MB | asr | No | ### Omni (45 subsets, 85,994 samples, 165.8GB) | Subset | Display Name | Samples | Size | Subcategory | Video | |--------|-------------|---------|------|-------------|-------| | `av_odyssey` | AV-Odyssey | 4,555 | 31.2GB | qa | No | | `avmeme_full` | AVMeme-Exam Full | 1,032 | 389MB | meme_understanding | Yes | | `avmeme_main` | AVMeme-Exam Main | 846 | 321MB | meme_understanding | Yes | | `avut_benchmark_gemini` | AVUT-Benchmark Gemini | 9,874 | 19.3GB | qa | Yes | | `avut_benchmark_human` | AVUT-Benchmark Human | 1,734 | 1.4GB | qa | Yes | | `daily_omni` | Daily-Omni | 1,197 | 966MB | qa | Yes | | `futureomni` | FutureOmni | 1,034 | 4.8GB | qa | Yes | | `jointavbench` | JointAVBench | 2,853 | 16.4GB | qa | Yes | | `livesports3k_cc` | LiveSports-3K CC | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_1fps_10` | LiveSports-3K CC (1fps, 10 input) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_1fps_10_prompt` | LiveSports-3K CC (1fps + prompt) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_1fps_10_uniform` | LiveSports-3K CC (1fps, 10 input, uniform) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_1fps_10_uniform_prompt` | LiveSports-3K CC (1fps uniform + prompt) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_1fps_40` | LiveSports-3K CC (1fps, 4帧stack) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_1fps_90` | LiveSports-3K CC (1fps, 9帧stack) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_2fps_11` | LiveSports-3K CC (2fps, 1+1) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_3fps_12` | LiveSports-3K CC (3fps, 1+2) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_5fps_14` | LiveSports-3K CC (5fps->2, 1+4) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_5fps_14_uniform` | LiveSports-3K CC (5fps->2, 1+4, uniform) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_8fps_44` | LiveSports-3K CC (8fps->2, 4+4) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_8fps_44_prompt` | LiveSports-3K CC (8fps + prompt) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_prompt` | LiveSports-3K CC (with prompt) | 1,702 | 3MB | caption | Yes | | `livesports3k_cc_under50s_1fps_10` | LiveSports-3K CC (<=50s, 1fps nm=[1,0]) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_1fps_10_fixed` | LiveSports-3K CC (<=50s, 1fps fixed) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_1fps_10_prompt` | LiveSports-3K CC (<=50s, 1fps + prompt) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_1fps_40` | LiveSports-3K CC (<=50s, 1fps nm=[4,0]) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_1fps_90` | LiveSports-3K CC (<=50s, 1fps nm=[9,0]) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_2fps_11` | LiveSports-3K CC (<=50s, 2fps nm=[1,1]) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_5fps_14` | LiveSports-3K CC (<=50s, 5fps nm=[1,4]) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_5fps_14_fixed` | LiveSports-3K CC (<=50s, 5fps fixed) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_8fps_44` | LiveSports-3K CC (<=50s, 8fps nm=[4,4]) | 1,310 | 2MB | caption | Yes | | `livesports3k_cc_under50s_8fps_44_prompt` | LiveSports-3K CC (<=50s, 8fps + prompt) | 1,310 | 2MB | caption | Yes | | `omni_openqa` | Omni OpenQA | 425 | 1.1GB | qa | Yes | | `omnibench` | OmniBench | 1,142 | 1.2GB | qa | No | | `ovavel` | OV-AVEL Test | 5,818 | 3.3GB | event_localization | Yes | | `ovobench` | OVO-Bench | 1,468 | 2.8GB | qa | Yes | | `streamingbench_omni_fix` | StreamingBench-Omni-Fix (Offline) | 1,000 | 1.7GB | qa | Yes | | `streamingbench_real` | StreamingBench-Real (Offline) | 2,499 | 3.7GB | qa | Yes | | `streamingbench_sqa` | StreamingBench-SQA (Offline) | 250 | 407MB | qa | Yes | | `unobench` | UNO-Bench | 3,730 | 7.8GB | qa | No | | `unobench_mc` | UNO-Bench-MC | 1,000 | 3.6GB | qa | No | | `video_holmes` | Video-Holmes | 1,837 | 3.3GB | qa | Yes | | `videomme` | Video-MME | 2,700 | 53.7GB | qa | Yes | | `videomme_short` | Video-MME (Short) | 900 | 821MB | qa | Yes | | `worldsense` | WorldSense | 3,172 | 7.6GB | qa | Yes | ## Data Format Each subset is stored as Parquet file(s): - **Metadata fields**: All original JSONL fields (question, answer, paths, etc.) - **`audio`**: HuggingFace `Audio` type — `struct<bytes: binary, path: string>`, playable in Dataset Viewer - **`image`**: HuggingFace `Image` type — `struct<bytes: binary, path: string>`, viewable in Dataset Viewer - **`audio_bytes_dict`**: Multiple audio files as base64-encoded JSON string - **`image_bytes_dict`**: Multiple image files as base64-encoded JSON string Video files are referenced by `VideoPath` / `video_path` fields but NOT embedded. ## License Apache 2.0. Individual datasets may have their own licenses.
提供机构:
xiaofff
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作