five

tsinghua-ee/SAVEBench

收藏
Hugging Face2024-10-14 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/tsinghua-ee/SAVEBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- Audio: 1. LibriSpeech test clean full set 2. Audio caps test full set Video: 1. NExTQA: nextqa_test.json ID provided in the "image" field Image: 1. Flickr30k: flickr30k_captions.json (this is the standard 1k test set). ID provided in the "image" field. 2. TextVQA: textvqa.json ID provided in the "image" field 3. GQA: testdev_balanced_questions_with_images.json ID provided in the "image" field Audio-visual: 1. How2: how2_test.json ID provided in "image". Format: <video_id>_<start_second>_<end_second>.mp4 or .wav. 2. Audio-Visual Sound Source Detection (AVSSD): testdata_formatted.json ID provided in the "image" field. The first one is image and the second one is the corresponding audio. 3. Audio Visual Matching (AVM): audiovisualmatching_combined.json ID provided in the "image" field as a list of two values. The first one is the image and the second one is the audio/speech Whether it is from VGGSS or is from SpokenCOCO is indicated in the ID as well 4. Audio-visual question answering (AVQA) Ego4D-QA: ego4d_qa.json Video ID indicates the frame index 5. Audio-visual question answering (AVQA) Presentation-QA: presentation_qa.json
提供机构:
tsinghua-ee
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作