KC-MMbench

Name: KC-MMbench
Creator: maas
Published: 2025-11-27 16:39:08
License: 暂无描述

魔搭社区2025-11-27 更新2025-07-05 收录

下载链接：

https://modelscope.cn/datasets/Kwai-Keye/KC-MMbench

下载链接

链接失效反馈

官方服务：

资源简介：

<font size=3><div align='center' > [[🍎 Home Page](https://kwai-keye.github.io/)] [[📖 Technical Report](https://huggingface.co/papers/2507.01949)] [[\ud83d\udcca Models](https://huggingface.co/Kwai-Keye)] [[\ud83d\ude80 Demo](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)] </div></font> This repository contains **KC-MMBench**, a new benchmark dataset meticulously tailored for real-world short-video scenarios, as presented in the paper "[Kwai Keye-VL Technical Report](https://huggingface.co/papers/2507.01949)". Constructed from [Kuaishou](https://www.kuaishou.com/) short video data, KC-MMBench comprises 6 distinct datasets designed to evaluate the performance of Vision-Language Models (VLMs) like [**Kwai Keye-VL-8B**](https://huggingface.co/Kwai-Keye/Keye-VL-8B-Preview), Qwen2.5-VL, and InternVL in comprehending dynamic, information-dense short-form videos. For the associated code, detailed documentation, and evaluation scripts, please refer to the official [Kwai Keye-VL GitHub repository](https://github.com/Kwai-Keye/Kwai-Keye-VL). If you want to use KC-MMbench, please download with: ```bash git clone https://huggingface.co/datasets/Kwai-Keye/KC-MMbench ``` ## Tasks | Task | Description | | -------------- | --------------------------------------------------------------------------- | | CPV | The task of predicting product attributes in e-commerce. | | Hot_Videos_Aggregation | The task of determining whether multiple videos belong to the same topic. | | Collection_Order | The task of determining the logical order between multiple videos with the same topic. | | Pornographic_Comment | The task of whether short video comments contain pornographic content. | | High_Like | A binary classification task to determine the rate of likes of a short video. | | SPU | The task of determining whether two items are the same product in e-commerce. | ## Performance | Task | Qwen2.5-VL-3B | Qwen2.5-VL-7B | InternVL-3-8B | MiMo-VL-7B | Kwai Keye-VL-8B | | -------------- | ------------- | ------------- | ------------- | ------- | ---- | | CPV | 12.39 | 20.08 | 14.95 | 16.66 | 55.13 | | Hot_Videos_Aggregation | 42.38 | 46.35 | 52.31 | 49.00 | 54.30 | | Collection_Order | 36.88 | 59.83 | 64.75 | 78.68 | 84.43 | | Pornographic_Comment | 56.61 | 56.08 | 57.14 | 68.25 | 71.96 | | High_Like | 48.85 | 47.94 | 47.03 | 51.14 | 55.25 | | SPU | 74.09 | 81.34 | 75.64 | 81.86 | 87.05 | ## Usage This section provides a quick guide on how to interact with models using the `keye-vl-utils` library, which is essential for processing and integrating visual language information with Keye Series Models like Kwai Keye-VL-8B. ### Install `keye-vl-utils` First, install the necessary utility library: ```bash pip install keye-vl-utils ``` ### Keye-VL Inference Example Here's an example of performing inference with a Kwai Keye-VL model, demonstrating how to prepare inputs for both image and video scenarios. ```python from transformers import AutoModel, AutoProcessor from keye_vl_utils import process_vision_info # default: Load the model on the available device(s) model_path = "Kwai-Keye/Keye-VL-8B-Preview" model = AutoModel.from_pretrained( model_path, torch_dtype="auto", device_map="auto", attn_implementation="flash_attention_2", trust_remote_code=True, ).to('cuda') # Example messages demonstrating various input types (image, video) messages = [ # Image Input Examples [{"role": "user", "content": [{"type": "image", "image": "file:///path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}], [{"role": "user", "content": [{"type": "image", "image": "http://path/to/your/image.jpg"}, {"type": "text", "text": "Describe this image."}]}], [{"role": "user", "content": [{"type": "image", "image": "data:image;base64,/9j/..."}, {"type": "text", "text": "Describe this image."}]}], # Video Input Examples (most relevant for KC-MMBench) [{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "Describe this video."}]}], [{"role": "user", "content": [{"type": "video", "video": ["file:///path/to/extracted_frame1.jpg", "file:///path/to/extracted_frame2.jpg", "file:///path/to/extracted_frame3.jpg"],}, {"type": "text", "text": "Describe this video."},],}], [{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", "fps": 2.0, "resized_height": 280, "resized_width": 280}, {"type": "text", "text": "Describe this video."}]}], ] processor = AutoProcessor.from_pretrained(model_path) # Note: model loaded above already text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) images, videos, video_kwargs = process_vision_info(messages, return_video_kwargs=True) inputs = processor(text=text, images=images, videos=videos, padding=True, return_tensors="pt", **video_kwargs).to("cuda") generated_ids = model.generate(**inputs) print(generated_ids) ``` ### Evaluation For detailed instructions on how to evaluate models using the KC-MMBench datasets, including setup and running evaluation scripts, please refer to the `evaluation/KC-MMBench/README.md` file in the official [Kwai Keye-VL GitHub repository](https://github.com/Kwai-Keye/Kwai-Keye-VL/tree/main/evaluation/KC-MMBench). Below is the example configuration for evaluation using VLMs on our datasets: ```python { "model": "...", # Specify your model "data": { "CPV": { "class": "KwaiVQADataset", "dataset": "CPV" }, "Hot_Videos_Aggregation": { "class": "KwaiVQADataset", "dataset": "Hot_Videos_Aggregation" }, "Collection_Order": { "class": "KwaiVQADataset", "dataset": "Collection_Order" }, "Pornographic_Comment": { "class": "KwaiYORNDataset", "dataset": "Pornographic_Comment" }, "High_like":{ "class":"KwaiYORNDataset", "dataset":"High_like" }, "SPU": { "class": "KwaiYORNDataset", "dataset": "SPU" } } } ```

<div align='center'>[[🍎 主页](https://kwai-keye.github.io/)] [[📖 技术报告](https://huggingface.co/papers/2507.01949)] [[📊 模型](https://huggingface.co/Kwai-Keye)] [[🚀 演示](https://huggingface.co/spaces/Kwai-Keye/Keye-VL-8B-Preview)]</div> 本仓库包含**KC-MMBench**——一项专为真实世界短视频场景精心打造的基准数据集，相关研究成果发表于论文《Kwai Keye-VL 技术报告》（https://huggingface.co/papers/2507.01949）。KC-MMBench源自快手（https://www.kuaishou.com/）的短视频数据，共包含6个独立数据集，用于评估视觉语言模型（Vision-Language Models, VLMs）的动态信息密集型短视频理解能力，涵盖的模型包括Kwai Keye-VL-8B、Qwen2.5-VL以及InternVL。如需获取配套代码、详细文档与评估脚本，请参阅官方Kwai Keye-VL GitHub仓库（https://github.com/Kwai-Keye/Kwai-Keye-VL）。如需使用KC-MMBench数据集，请通过以下命令克隆获取： bash git clone https://huggingface.co/datasets/Kwai-Keye/KC-MMbench ## 任务 | 任务名称 | 任务描述 | | ---- | ---- | | CPV | 电子商务场景下的商品属性预测任务 | | Hot_Videos_Aggregation | 判断多个视频是否属于同一主题的任务 | | Collection_Order | 判断同主题多个视频之间逻辑顺序的任务 | | Pornographic_Comment | 判断短视频评论是否包含色情内容的任务 | | High_Like | 二分类任务，用于预测短视频的点赞率 | | SPU | 电子商务场景下判断两件商品是否为同一产品的任务 | ## 性能表现 | 任务名称 | Qwen2.5-VL-3B | Qwen2.5-VL-7B | InternVL-3-8B | MiMo-VL-7B | Kwai Keye-VL-8B | | ---- | ---- | ---- | ---- | ---- | ---- | | CPV | 12.39 | 20.08 | 14.95 | 16.66 | 55.13 | | Hot_Videos_Aggregation | 42.38 | 46.35 | 52.31 | 49.00 | 54.30 | | Collection_Order | 36.88 | 59.83 | 64.75 | 78.68 | 84.43 | | Pornographic_Comment | 56.61 | 56.08 | 57.14 | 68.25 | 71.96 | | High_Like | 48.85 | 47.94 | 47.03 | 51.14 | 55.25 | | SPU | 74.09 | 81.34 | 75.64 | 81.86 | 87.05 | ## 使用方法本节提供了使用`keye-vl-utils`工具库与模型交互的快速指南，该工具库是处理视觉语言信息并集成至Keye系列模型（如Kwai Keye-VL-8B）的必备工具。 ### 安装`keye-vl-utils` 首先安装所需的工具库： bash pip install keye-vl-utils ### Keye-VL 推理示例以下为使用Kwai Keye-VL模型进行推理的示例，展示了如何为图像与视频场景准备输入数据。 python from transformers import AutoModel, AutoProcessor from keye_vl_utils import process_vision_info # 默认配置：将模型加载至可用设备 model_path = "Kwai-Keye/Keye-VL-8B-Preview" model = AutoModel.from_pretrained( model_path, torch_dtype="auto", device_map="auto", attn_implementation="flash_attention_2", trust_remote_code=True, ).to('cuda') # 演示多种输入类型（图像、视频）的示例消息 messages = [ # 图像输入示例 [{"role": "user", "content": [{"type": "image", "image": "file:///path/to/your/image.jpg"}, {"type": "text", "text": "请描述这张图片。"}]}], [{"role": "user", "content": [{"type": "image", "image": "http://path/to/your/image.jpg"}, {"type": "text", "text": "请描述这张图片。"}]}], [{"role": "user", "content": [{"type": "image", "image": "data:image;base64,/9j/..."}, {"type": "text", "text": "请描述这张图片。"}]}], # 视频输入示例（与KC-MMBench适配的核心场景） [{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4"}, {"type": "text", "text": "请描述该视频。"}]}], [{"role": "user", "content": [{"type": "video", "video": ["file:///path/to/extracted_frame1.jpg", "file:///path/to/extracted_frame2.jpg", "file:///path/to/extracted_frame3.jpg"],}, {"type": "text", "text": "请描述该视频。"},],}], [{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", "fps": 2.0, "resized_height": 280, "resized_width": 280}, {"type": "text", "text": "请描述该视频。"}]}], ] processor = AutoProcessor.from_pretrained(model_path) # 注意：上述模型已完成加载 text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) images, videos, video_kwargs = process_vision_info(messages, return_video_kwargs=True) inputs = processor(text=text, images=images, videos=videos, padding=True, return_tensors="pt", **video_kwargs).to("cuda") generated_ids = model.generate(**inputs) print(generated_ids) ### 模型评估如需了解使用KC-MMBench数据集评估模型的详细步骤（包括环境配置与运行评估脚本），请参阅官方Kwai Keye-VL GitHub仓库中`evaluation/KC-MMBench/README.md`文件（https://github.com/Kwai-Keye/Kwai-Keye-VL/tree/main/evaluation/KC-MMBench）。以下为在本数据集上使用视觉语言模型进行评估的示例配置： python { "model": "...", # 指定待评估模型 "data": { "CPV": { "class": "KwaiVQADataset", "dataset": "CPV" }, "Hot_Videos_Aggregation": { "class": "KwaiVQADataset", "dataset": "Hot_Videos_Aggregation" }, "Collection_Order": { "class": "KwaiVQADataset", "dataset": "Collection_Order" }, "Pornographic_Comment": { "class": "KwaiYORNDataset", "dataset": "Pornographic_Comment" }, "High_like":{ "class":"KwaiYORNDataset", "dataset":"High_like" }, "SPU": { "class": "KwaiYORNDataset", "dataset": "SPU" } } }

提供机构：

maas

创建时间：

2025-07-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集