Human-AGVQA

Name: Human-AGVQA
Creator: 上海交通大学
Published: 2024-11-26 01:58:43
License: 暂无描述

arXiv2024-11-26 更新2024-11-27 收录

下载链接：

https://github.com/zczhang-sjtu/GHVQ

下载链接

链接失效反馈

官方服务：

资源简介：

Human-AGVQA数据集由上海交通大学创建，包含3200个由8种先进的文本到视频（T2V）模型生成的视频，这些视频基于400个描述多样化人类活动的文本提示。数据集的创建过程包括选择文本提示、使用T2V模型生成视频，并通过主观实验评估视频的人体外观质量、动作连续性和整体视频质量。该数据集主要用于评估和优化AI生成的人类活动视频的质量，旨在解决当前T2V模型在生成高质量人类活动视频方面的不足。

The Human-AGVQA dataset was developed by Shanghai Jiao Tong University, containing 3200 videos generated by 8 state-of-the-art text-to-video (T2V) models based on 400 text prompts that describe diverse human activities. The dataset construction process includes three main stages: selecting text prompts, generating videos via T2V models, and evaluating the human appearance quality, motion continuity and overall video quality of the generated videos through subjective experiments. This dataset is primarily used to evaluate and optimize the quality of AI-generated human activity videos, aiming to address the current limitations of T2V models in generating high-quality human activity videos.

提供机构：

上海交通大学

创建时间：

2024-11-26

搜集汇总

数据集介绍

构建方式

The Human-AGVQA dataset is meticulously constructed to address the challenges in assessing the quality of AI-generated videos (AGVs) involving human activities. This dataset comprises 3,200 AGVs generated by 8 popular text-to-video (T2V) models, utilizing 400 diverse text prompts describing various human activities. The construction process involves a subjective study to evaluate the human appearance quality, action continuity quality, and overall video quality of AGVs, along with identifying semantic issues of human body parts. This comprehensive approach ensures a robust benchmark for assessing the quality of human activity AGVs.

特点

The Human-AGVQA dataset stands out for its diversity and richness in content, encompassing a wide range of human activities described by 400 text prompts. The dataset includes 3,200 AGVs generated by state-of-the-art T2V models, providing a comprehensive testbed for quality assessment metrics. The subjective study conducted on this dataset offers detailed quality labels, making it an invaluable resource for developing and validating quality metrics for human activity AGVs. Additionally, the dataset includes semantic distortion identification, which provides insights for further optimization of T2V models.

使用方法

The Human-AGVQA dataset can be utilized to benchmark the performance of T2V models and to develop and validate objective evaluation metrics for human activity AGVs. Researchers can employ this dataset to train and test their quality assessment models, focusing on human appearance quality, action continuity quality, and overall video quality. The dataset's detailed quality labels and semantic distortion identification also enable fine-grained analysis and optimization of T2V models. Furthermore, the dataset can serve as a reference for subjective quality assessment studies, providing a standardized ground truth for comparison.

背景与挑战

背景概述

Human-AGVQA is a pioneering dataset designed to address the critical need for assessing the quality of AI-generated videos (AGVs) involving human activities. Developed by researchers from Shanghai Jiao Tong University and Huawei Technologies, this dataset comprises 3,200 AGVs generated by eight popular text-to-video (T2V) models using 400 diverse text prompts describing various human activities. The dataset was constructed to benchmark the performance of T2V models and to develop an objective evaluation metric for human activity AGVs. The Human-AGVQA dataset and the accompanying AI-Generated Human activity Video Quality metric (GHVQ) aim to bridge the gap in quality assessment for AI-generated content, particularly in the context of human activities, which often exhibit substantial visual and semantic distortions.

当前挑战

The primary challenge addressed by the Human-AGVQA dataset is the accurate assessment of the quality of AI-generated videos involving human activities. This includes evaluating visual quality, action continuity, and identifying semantic distortions in human body parts. The dataset faces several key challenges: 1) The inherent difficulty in generating realistic human figures and actions by current T2V models, leading to significant visual and semantic distortions. 2) The lack of existing quality assessment metrics that can effectively evaluate AGVs, as general image/video quality assessment (I/VQA) metrics perform poorly on AGVs. 3) The complexity of constructing a comprehensive dataset that covers a wide range of human activities and accurately labels the quality of AGVs. 4) Developing an objective evaluation metric that can systematically extract human-focused quality features, AI-generated content-aware quality features, and temporal continuity features to provide a comprehensive and explainable quality assessment for human activity AGVs.

常用场景

经典使用场景

Human-AGVQA数据集最经典的使用场景在于评估和提升文本到视频生成模型的质量。通过该数据集，研究者可以分析不同文本到视频（T2V）模型在生成包含人类活动的视频时的表现，特别是关注视频中人类外观质量、动作连续性质量以及整体视频质量。这种评估有助于识别和量化模型在生成高质量人类活动视频方面的优势和不足，从而指导模型的优化和改进。

解决学术问题

Human-AGVQA数据集解决了当前文本到视频生成技术中的一个关键学术问题，即如何客观且准确地评估AI生成视频（AGVs）中的人类活动质量。传统的图像/视频质量评估（I/VQA）方法在评估AGVs时表现不佳，而Human-AGVQA通过提供一个包含3200个AGVs的基准数据集，以及一个名为GHVQ的客观评估指标，填补了这一空白。这不仅为T2V模型的性能基准测试提供了基础，还为开发更有效的质量评估方法提供了实验平台，推动了AI生成内容（AIGC）领域的研究进展。

衍生相关工作

Human-AGVQA数据集的发布和GHVQ指标的提出，激发了一系列相关研究工作。例如，有研究者基于Human-AGVQA数据集开发了新的文本到视频生成模型，这些模型在生成人类活动视频时表现更为出色。此外，还有研究聚焦于改进现有的质量评估方法，以更好地适应AI生成视频的特性。这些衍生工作不仅扩展了Human-AGVQA的应用范围，也促进了整个文本到视频生成和质量评估领域的发展。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集