人工智能多语言多模态数据集
收藏北京市数据知识产权2024-01-05 更新2024-05-08 收录
下载链接:
https://webs.bjidex.com/sys-bsc-home/#/bscConsole/intellectualProperty/infoPublicity?action=1
下载链接
链接失效反馈官方服务:
资源简介:
本数据集可用于多语言多模态大模型和数字人领域算法训练,具体介绍如下。
在大模型领域,大语言模型提供了文本理解能力,在此基础上,各大人工智能企业正构建多模态大模型算法能力,多模态大模型主要功能包括文本生成图像、图像描述、文本生成视频、视频描述等,上述任务需要高质量的图像-文本及视频-文本对数据集,本数据集包含多种场景、多种描述、多种语言的图像-文本对和视频-文本对数据集,可在上述任务中对基础多模态大模型和行业多模态大模型提供算法训练及测试支持。
在数字人领域,人工智能企业正研发多种场景数字人合成技术,包括但不限于动画、游戏、主播、客服、陪伴机器人等。数字人合成算法需要大量高质量的同时带有文本、语音、图像、视频的多模态数据集。本数据集包括不同人种、不同风格、不同年龄、不同情绪、不同场景的多模态数据,数据包含语音、文本、图像、视频模态,可为上述各场景数字人合成算法训练及测试提供支持。
This dataset can be used for algorithm training in the fields of multilingual multimodal large models and digital humans, with specific introductions as follows.
In the field of large models, large language models (LLMs) provide text understanding capabilities. On this basis, major AI enterprises are building algorithmic capabilities for multimodal large models. The main functions of multimodal large models include text-to-image generation, image captioning, text-to-video generation, video captioning, etc. These tasks require high-quality image-text and video-text paired datasets. This dataset contains image-text and video-text paired datasets across various scenarios, with diverse descriptions and in multiple languages, and can provide algorithm training and testing support for both basic and industry-specific multimodal large models in the aforementioned tasks.
In the field of digital humans, AI enterprises are developing digital human synthesis technologies for various scenarios, including but not limited to animation, games, anchors, customer service, companion robots, etc. Digital human synthesis algorithms require a large number of high-quality multimodal datasets that simultaneously include text, speech, image and video modalities. This dataset includes multimodal data covering different ethnic groups, styles, ages, emotions and scenarios, with modalities including speech, text, image and video, and can provide support for algorithm training and testing of digital human synthesis technologies in the aforementioned scenarios.
提供机构:
数据堂(北京)科技股份有限公司
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个专为多语言多模态大模型和数字人算法训练设计的高质量资源。它包含多种场景、描述和语言的图像-文本对和视频-文本对,支持文本生成图像、视频描述等任务;同时提供不同人种、风格、年龄和情绪的多模态数据,涵盖语音、文本、图像和视频,适用于数字人合成技术的开发与测试。
以上内容由遇见数据集搜集并总结生成



