KC-MMbench
收藏魔搭社区2025-11-27 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/Kwai-Keye/KC-MMbench
下载链接
链接失效反馈官方服务:
资源简介:
<font size=3><div align='center' > [[🍎 Home Page](https://kwai-keye.github.io/)] [[📖 Technical Report](https://huggingface.co/papers/2507.01949)] [[\ud83d\udcca Models](https://huggingface.co/Kwai-Keye)] [[\ud83d\ude80 Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)] </div></font>
This repository contains **KC-MMBench**, a new benchmark dataset meticulously tailored for real-world short-video scenarios, as presented in the paper "[Kwai Keye-VL Technical Report](https://huggingface.co/papers/2507.01949)". Constructed from [Kuaishou](https://www.kuaishou.com/) short video data, KC-MMBench comprises 6 distinct datasets designed to evaluate the performance of Vision-Language Models (VLMs) like [**Kwai Keye-VL-8B**](https://huggingface.co/Kwai-Keye/Keye-VL-8B-Preview), Qwen2.5-VL, and InternVL in comprehending dynamic, information-dense short-form videos.
For the associated code, detailed documentation, and evaluation scripts, please refer to the official [Kwai Keye-VL GitHub repository](https://github.com/Kwai-Keye/Kwai-Keye-VL).
If you want to use KC-MMbench, please download with:
```bash
git clone https://huggingface.co/datasets/Kwai-Keye/KC-MMbench
```
## Tasks
| Task | Description |
| -------------- | --------------------------------------------------------------------------- |
| CPV | The task of predicting product attributes in e-commerce. |
| Hot_Videos_Aggregation | The task of determining whether multiple videos belong to the same topic. |
| Collection_Order | The task of determining the logical order between multiple videos with the same topic. |
| Pornographic_Comment | The task of whether short video comments contain pornographic content. |
| High_Like | A binary classification task to determine the rate of likes of a short video. |
| SPU | The task of determining whether two items are the same product in e-commerce. |
## Performance
| Task | Qwen2.5-VL-3B | Qwen2.5-VL-7B | InternVL-3-8B | MiMo-VL-7B | Kwai Keye-VL-8B |
| -------------- | ------------- | ------------- | ------------- | ------- | ---- |
| CPV | 12.39 | 20.08 | 14.95 | 16.66 | 55.13 |
| Hot_Videos_Aggregation | 42.38 | 46.35 | 52.31 | 49.00 | 54.30 |
| Collection_Order | 36.88 | 59.83 | 64.75 | 78.68 | 84.43 |
| Pornographic_Comment | 56.61 | 56.08 | 57.14 | 68.25 | 71.96 |
| High_Like | 48.85 | 47.94 | 47.03 | 51.14 | 55.25 |
| SPU | 74.09 | 81.34 | 75.64 | 81.86 | 87.05 |
## Usage
This section provides a quick guide on how to interact with models using the `keye-vl-utils` library, which is essential for processing and integrating visual language information with Keye Series Models like Kwai Keye-VL-8B.
### Install `keye-vl-utils`
First, install the necessary utility library:
```bash
pip install keye-vl-utils
```
### Keye-VL Inference Example
Here's an example of performing inference with a Kwai Keye-VL model, demonstrating how to prepare inputs for both image and video scenarios.
```python
from transformers import AutoModel, AutoProcessor
from keye_vl_utils import process_vision_info
# default: Load the model on the available device(s)
model_path = "Kwai-Keye/Keye-VL-8B-Preview"
model = AutoModel.from_pretrained(
model_path, torch_dtype="auto", device_map="auto", attn_implementation="flash_attention_2", trust_remote_code=True,
).to('cuda')
# Example messages demonstrating various input types (image, video)
messages = [
# Image Input Examples
[{"role": "user", "content": [{"type": "image", "image": "file:///path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}],
[{"role": "user", "content": [{"type": "image", "image": "http://path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}],
[{"role": "user", "content": [{"type": "image", "image": "data:image;base64,/9j/..."}, {"type": "text", "text": "Describe this image."}]}],
# Video Input Examples (most relevant for KC-MMBench)
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "Describe this video."}]}],
[{"role": "user", "content": [{"type": "video", "video": ["file:///path/to/extracted_frame1.jpg", "file:///path/to/extracted_frame2.jpg", "file:///path/to/extracted_frame3.jpg"],}, {"type": "text", "text": "Describe this video."},],}],
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", "fps": 2.0, "resized_height": 280, "resized_width": 280}, {"type": "text", "text": "Describe this video."}]}],
]
processor = AutoProcessor.from_pretrained(model_path)
# Note: model loaded above already
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
images, videos, video_kwargs = process_vision_info(messages, return_video_kwargs=True)
inputs = processor(text=text, images=images, videos=videos, padding=True, return_tensors="pt", **video_kwargs).to("cuda")
generated_ids = model.generate(**inputs)
print(generated_ids)
```
### Evaluation
For detailed instructions on how to evaluate models using the KC-MMBench datasets, including setup and running evaluation scripts, please refer to the `evaluation/KC-MMBench/README.md` file in the official [Kwai Keye-VL GitHub repository](https://github.com/Kwai-Keye/Kwai-Keye-VL/tree/main/evaluation/KC-MMBench).
Below is the example configuration for evaluation using VLMs on our datasets:
```python
{
"model": "...", # Specify your model
"data": {
"CPV": {
"class": "KwaiVQADataset",
"dataset": "CPV"
},
"Hot_Videos_Aggregation": {
"class": "KwaiVQADataset",
"dataset": "Hot_Videos_Aggregation"
},
"Collection_Order": {
"class": "KwaiVQADataset",
"dataset": "Collection_Order"
},
"Pornographic_Comment": {
"class": "KwaiYORNDataset",
"dataset": "Pornographic_Comment"
},
"High_like":{
"class":"KwaiYORNDataset",
"dataset":"High_like"
},
"SPU": {
"class": "KwaiYORNDataset",
"dataset": "SPU"
}
}
}
```
<div align='center'>[[🍎 主页](https://kwai-keye.github.io/)] [[📖 技术报告](https://huggingface.co/papers/2507.01949)] [[📊 模型](https://huggingface.co/Kwai-Keye)] [[🚀 演示](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)]</div>
本仓库包含**KC-MMBench**——一项专为真实世界短视频场景精心打造的基准数据集,相关研究成果发表于论文《Kwai Keye-VL 技术报告》(https://huggingface.co/papers/2507.01949)。KC-MMBench源自快手(https://www.kuaishou.com/)的短视频数据,共包含6个独立数据集,用于评估视觉语言模型(Vision-Language Models, VLMs)的动态信息密集型短视频理解能力,涵盖的模型包括Kwai Keye-VL-8B、Qwen2.5-VL以及InternVL。
如需获取配套代码、详细文档与评估脚本,请参阅官方Kwai Keye-VL GitHub仓库(https://github.com/Kwai-Keye/Kwai-Keye-VL)。
如需使用KC-MMBench数据集,请通过以下命令克隆获取:
bash
git clone https://huggingface.co/datasets/Kwai-Keye/KC-MMbench
## 任务
| 任务名称 | 任务描述 |
| ---- | ---- |
| CPV | 电子商务场景下的商品属性预测任务 |
| Hot_Videos_Aggregation | 判断多个视频是否属于同一主题的任务 |
| Collection_Order | 判断同主题多个视频之间逻辑顺序的任务 |
| Pornographic_Comment | 判断短视频评论是否包含色情内容的任务 |
| High_Like | 二分类任务,用于预测短视频的点赞率 |
| SPU | 电子商务场景下判断两件商品是否为同一产品的任务 |
## 性能表现
| 任务名称 | Qwen2.5-VL-3B | Qwen2.5-VL-7B | InternVL-3-8B | MiMo-VL-7B | Kwai Keye-VL-8B |
| ---- | ---- | ---- | ---- | ---- | ---- |
| CPV | 12.39 | 20.08 | 14.95 | 16.66 | 55.13 |
| Hot_Videos_Aggregation | 42.38 | 46.35 | 52.31 | 49.00 | 54.30 |
| Collection_Order | 36.88 | 59.83 | 64.75 | 78.68 | 84.43 |
| Pornographic_Comment | 56.61 | 56.08 | 57.14 | 68.25 | 71.96 |
| High_Like | 48.85 | 47.94 | 47.03 | 51.14 | 55.25 |
| SPU | 74.09 | 81.34 | 75.64 | 81.86 | 87.05 |
## 使用方法
本节提供了使用`keye-vl-utils`工具库与模型交互的快速指南,该工具库是处理视觉语言信息并集成至Keye系列模型(如Kwai Keye-VL-8B)的必备工具。
### 安装`keye-vl-utils`
首先安装所需的工具库:
bash
pip install keye-vl-utils
### Keye-VL 推理示例
以下为使用Kwai Keye-VL模型进行推理的示例,展示了如何为图像与视频场景准备输入数据。
python
from transformers import AutoModel, AutoProcessor
from keye_vl_utils import process_vision_info
# 默认配置:将模型加载至可用设备
model_path = "Kwai-Keye/Keye-VL-8B-Preview"
model = AutoModel.from_pretrained(
model_path, torch_dtype="auto", device_map="auto", attn_implementation="flash_attention_2", trust_remote_code=True,
).to('cuda')
# 演示多种输入类型(图像、视频)的示例消息
messages = [
# 图像输入示例
[{"role": "user", "content": [{"type": "image", "image": "file:///path/to/your/image.jpg"}, {"type": "text", "text": "请描述这张图片。"}]}],
[{"role": "user", "content": [{"type": "image", "image": "http://path/to/your/image.jpg"}, {"type": "text", "text": "请描述这张图片。"}]}],
[{"role": "user", "content": [{"type": "image", "image": "data:image;base64,/9j/..."}, {"type": "text", "text": "请描述这张图片。"}]}],
# 视频输入示例(与KC-MMBench适配的核心场景)
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "请描述该视频。"}]}],
[{"role": "user", "content": [{"type": "video", "video": ["file:///path/to/extracted_frame1.jpg", "file:///path/to/extracted_frame2.jpg", "file:///path/to/extracted_frame3.jpg"],}, {"type": "text", "text": "请描述该视频。"},],}],
[{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", "fps": 2.0, "resized_height": 280, "resized_width": 280}, {"type": "text", "text": "请描述该视频。"}]}],
]
processor = AutoProcessor.from_pretrained(model_path)
# 注意:上述模型已完成加载
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
images, videos, video_kwargs = process_vision_info(messages, return_video_kwargs=True)
inputs = processor(text=text, images=images, videos=videos, padding=True, return_tensors="pt", **video_kwargs).to("cuda")
generated_ids = model.generate(**inputs)
print(generated_ids)
### 模型评估
如需了解使用KC-MMBench数据集评估模型的详细步骤(包括环境配置与运行评估脚本),请参阅官方Kwai Keye-VL GitHub仓库中`evaluation/KC-MMBench/README.md`文件(https://github.com/Kwai-Keye/Kwai-Keye-VL/tree/main/evaluation/KC-MMBench)。
以下为在本数据集上使用视觉语言模型进行评估的示例配置:
python
{
"model": "...", # 指定待评估模型
"data": {
"CPV": {
"class": "KwaiVQADataset",
"dataset": "CPV"
},
"Hot_Videos_Aggregation": {
"class": "KwaiVQADataset",
"dataset": "Hot_Videos_Aggregation"
},
"Collection_Order": {
"class": "KwaiVQADataset",
"dataset": "Collection_Order"
},
"Pornographic_Comment": {
"class": "KwaiYORNDataset",
"dataset": "Pornographic_Comment"
},
"High_like":{
"class":"KwaiYORNDataset",
"dataset":"High_like"
},
"SPU": {
"class": "KwaiYORNDataset",
"dataset": "SPU"
}
}
}
提供机构:
maas
创建时间:
2025-07-02



