google/wit|多模态学习数据集|自然语言处理数据集

hugging_face2022-07-04 更新2024-03-04 收录

多模态学习

自然语言处理

下载链接：

https://hf-mirror.com/datasets/google/wit

下载链接

链接失效反馈

资源简介：

--- annotations_creators: - machine-generated language_creators: - found language: - af - ar - ast - azb - be - bg - bn - br - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gl - hr - hu - hy - id - it - iw - ja - ka - ko - la - lt - lv - mk - ml - ms - nl - nn - 'no' - pl - pt - ro - ru - sk - sl - sr - sv - th - tr - uk - ur - vi - vo - zh license: - cc-by-sa-3.0 multilinguality: - multilingual paperswithcode_id: wit pretty_name: Wikipedia-based Image Text size_categories: - 10M<n<100M source_datasets: - original - extended|wikipedia task_categories: - text-retrieval - image-to-text task_ids: - text-retrieval-other-text-image-retrieval - image-captioning --- # Dataset Card for WIT ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Dataset Preprocessing](#dataset-preprocessing) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [WIT homepage](https://github.com/google-research-datasets/wit) - **Repository:** [WIT repository](https://github.com/google-research-datasets/wit) - **Paper:** [WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning ](https://arxiv.org/abs/2103.01913) - **Leaderboard:** [WIT leaderboard](https://www.kaggle.com/c/wikipedia-image-caption) - **Point of Contact:** [WIT e-mail](mailto:wit-dataset@google.com) ### Dataset Summary Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal machine learning models. A few unique advantages of WIT: * The largest multimodal dataset (time of this writing) by the number of image-text examples. * A massively multilingual (first of its kind) with coverage for over 100+ languages. * A collection of diverse set of concepts and real world entities. * Brings forth challenging real-world test sets. ### Dataset Preprocessing This dataset doesn't download the images locally by default. Instead, it exposes URLs to the images. To fetch the images, use the following code: ```python from concurrent.futures import ThreadPoolExecutor from functools import partial import io import urllib import PIL.Image from datasets import load_dataset from datasets.utils.file_utils import get_datasets_user_agent def fetch_single_image(image_url, timeout=None, retries=0): for _ in range(retries + 1): try: request = urllib.request.Request( image_url, data=None, headers={"user-agent": get_datasets_user_agent()}, ) with urllib.request.urlopen(request, timeout=timeout) as req: image = PIL.Image.open(io.BytesIO(req.read())) break except Exception: image = None return image def fetch_images(batch, num_threads, timeout=None, retries=0): fetch_single_image_with_args = partial(fetch_single_image, timeout=timeout, retries=retries) with ThreadPoolExecutor(max_workers=num_threads) as executor: batch["image"] = list(executor.map(fetch_single_image_with_args, batch["image_url"])) return batch num_threads = 20 dset = load_dataset("wit") dset = dset.map(fetch_images, batched=True, batch_size=100, fn_kwargs={"num_threads": num_threads}) ``` ### Supported Tasks and Leaderboards - `image-captioning`: This dataset can be used to train a model for image captioning where the goal is to predict a caption given the image. - `text-retrieval`: The goal in this task is to build a model that retrieves the text closest to an image. In these tasks, any combination of the `caption_reference_description`, `caption_attribution_description` and `caption_alt_text_description` fields can be used as the input text/caption. ### Languages The dataset contains examples from all Wikipedia languages, with the following stats: Image-Text | # Lang | Uniq. Images | # Lang ------------ | ------ | ------------- | ------ total > 1M | 9 | images > 1M | 6 total > 500K | 10 | images > 500K | 12 total > 100K | 36 | images > 100K | 35 total > 50K | 15 | images > 50K | 17 total > 14K | 38 | images > 13K | 38 ## Dataset Structure ### Data Instances ``` { 'language': 'en', 'page_url': 'https://en.wikipedia.org/wiki/Oxydactylus', 'image_url': 'https://upload.wikimedia.org/wikipedia/commons/5/5f/Oxydactylus_longipes_fm.jpg', 'page_title': 'Oxydactylus', 'section_title': None, 'hierarchical_section_title': 'Oxydactylus', 'caption_reference_description': None, 'caption_attribution_description': 'English: Mounted skeleton of Oxydactylus longipes in the Field Museum of Natural History.', 'caption_alt_text_description': None, 'mime_type': 'image/jpeg', 'original_height': 3564, 'original_width': 2748, 'is_main_image': True, 'attribution_passes_lang_id': True, 'page_changed_recently': True, 'context_page_description': 'Oxydactylus is an extinct genus of camelid endemic to North America. It lived from the Late Oligocene to the Middle Miocene, existing for approximately 14 million years. The name is from the Ancient Greek οξύς and δάκτυλος.\nThey had very long legs and necks, and were probably adapted to eating high vegetation, much like modern giraffes. Unlike modern camelids, they had hooves, rather than tough sole-pads, and splayed toes.', 'context_section_description': 'Oxydactylus is an extinct genus of camelid endemic to North America. It lived from the Late Oligocene to the Middle Miocene (28.4–13.7 mya), existing for approximately 14 million years. The name is from the Ancient Greek οξύς (oxys, "sharp")and δάκτυλος (daktylos, "finger").\n \nThey had very long legs and necks, and were probably adapted to eating high vegetation, much like modern giraffes. Unlike modern camelids, they had hooves, rather than tough sole-pads, and splayed toes.' } ``` ### Data Fields - `language`: Language code depicting wikipedia language of the page - `page_url`: URL to wikipedia page - `image_url`: URL to wikipedia image - `page_title`: Wikipedia page's title - `section_title`: Section's title - `hierarchical_section_title`: Hierarchical section's title - `caption_reference_description`: This is the caption that is visible on the wiki page directly below the image. - `caption_attribution_description`: This is the text found on the Wikimedia page of the image. This text is common to all occurrences of that image across all Wikipedias and thus can be in a language different to the original page article. - `caption_alt_text_description`: This is the “alt” text associated with the image. While not visible in general, it is commonly used for accessibility / screen readers - `mime_type`: Mime type associated to the image. - `original_height`: Image height - `original_width`: Image width - `is_main_image`: Flag determining if the image is the first image of the page. Usually displayed on the top-right part of the page when using web browsers. - `attribution_passes_lang_id`: Compared `language` field with the attribution language (written in the prefix of the attribution description). - `page_changed_recently`: [More Information Needed] - `context_page_description`: Page description corresponds to the short description of the page. It provides a concise explanation of the scope of the page. - `context_section_description`: Text within the image's section. <img width='75%' src='https://production-media.paperswithcode.com/datasets/Screenshot_2021-03-04_at_14.26.02.png' alt="Half Dome" /> Figure: WIT annotation example. Details on the field content can be found directly in the [paper, figure 5 and table 12.](https://arxiv.org/abs/2103.01913) ### Data Splits All data is held in `train` split, with a total of 37046386 rows. ## Dataset Creation ### Curation Rationale From the [repository](https://github.com/google-research-datasets/wit#motivation): > Multimodal visio-linguistic models rely on a rich dataset to help them learn to model the relationship between images and texts. Having large image-text datasets can significantly improve performance, as shown by recent works. Furthermore the lack of language coverage in existing datasets (which are mostly only in English) also impedes research in the multilingual multimodal space – we consider this a lost opportunity given the potential shown in leveraging images (as a language-agnostic medium) to help improve our multilingual textual understanding. > > To address these challenges and advance research on multilingual, multimodal learning we created the Wikipedia-based Image Text (WIT) Dataset. WIT is created by extracting multiple different texts associated with an image (e.g., as shown in the above image) from Wikipedia articles and Wikimedia image links. This was accompanied by rigorous filtering to only retain high quality image-text sets. > > The resulting dataset contains over 37.6 million image-text sets – making WIT the largest multimodal dataset (publicly available at the time of this writing) with unparalleled multilingual coverage – with 12K+ examples in each of 108 languages (53 languages have 100K+ image-text pairs). ### Source Data #### Initial Data Collection and Normalization From the [paper, section 3.1](https://arxiv.org/abs/2103.01913): > We started with all Wikipedia content pages (i.e., ignoring other pages that have discussions, comments and such). These number about ∼124M pages across 279 languages. #### Who are the source language producers? Text was extracted from Wikipedia. ### Annotations #### Annotation process WIT was constructed using an automatic process. However it was human-validated. From the [paper, section 3.7](https://arxiv.org/abs/2103.01913): > To further verify the quality of the WIT dataset we performed a study using (crowd-sourced) human annotators. As seen in Fig. 3, we asked raters to answer 3 questions. Given an image and the page title, raters first evaluate the quality of the attribution description and reference description in the first two questions (order randomized). The third question understands the contextual quality of these text descriptions given the page description and caption. Each response is on a 3-point scale: "Yes" if the text perfectly describes the image, "Maybe" if it is sufficiently explanatory and "No" if it is irrelevant or the image is inappropriate. #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases From the [paper, section 3.4](https://arxiv.org/abs/2103.01913): > Lastly we found that certain image-text pairs occurred very frequently. These were often generic images that did not have much to do with the main article page. Common examples included flags, logos, maps, insignia and such. To prevent biasing the data, we heavily under-sampled all such images ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information ```bibtex @article{srinivasan2021wit, title={WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning}, author={Srinivasan, Krishna and Raman, Karthik and Chen, Jiecao and Bendersky, Michael and Najork, Marc}, journal={arXiv preprint arXiv:2103.01913}, year={2021} } ``` ### Contributions Thanks to [@thomasw21](https://github.com/thomasw21), [@nateraw](https://github.com/nateraw) and [hassiahk](https://github.com/hassiahk) for adding this dataset.

提供机构：

google

原始信息汇总

数据集概述

数据集名称

名称: Wikipedia-based Image Text (WIT)
别名: WIT

数据集基本信息

类型: 多模态多语言数据集
规模: 包含37.6 million的图像-文本对，涉及11.5 million独特图像，覆盖108种Wikipedia语言
语言: 支持多种语言，包括但不限于英语、中文、阿拉伯语等
许可证: cc-by-sa-3.0

数据集特点

规模: 目前最大的多模态数据集
多语言性: 覆盖超过100种语言，是首个此类数据集
内容多样性: 包含多种概念和现实世界实体
应用挑战性: 提供具有挑战性的真实世界测试集

数据集结构

数据实例: 每个实例包含语言、页面URL、图像URL等详细信息
数据字段: 包括语言、页面URL、图像URL、页面标题等
数据分割: 所有数据存储在train分割中，共37046386行

数据集创建

数据来源: 从Wikipedia内容页面提取
注释过程: 自动生成，经过人工验证

支持的任务

图像标题生成: 训练模型以预测给定图像的标题
文本检索: 构建模型以检索与图像最接近的文本

使用注意事项

数据偏差: 已采取措施减少常见图像的过度采样，以避免数据偏差

附加信息

引用信息: 引用格式请参考提供的BibTeX条目
贡献者: 感谢多位GitHub用户对该数据集的贡献

AI搜集汇总

数据集介绍

构建方式

WIT数据集的构建，是基于从Wikipedia文章和Wikimedia图像链接中提取与图像关联的多种文本。首先，从Wikipedia的内容页面中获取数据，并经过严格的过滤，保留高质量的图像文本对。然后，通过自动化的过程构建数据集，并由人工验证其质量。为了确保文本描述与图像内容的高度相关性，研究人员设计了一个由人工标注者参与的评价体系，对文本描述进行质量评估。

特点

WIT数据集具有几个显著特点。首先，它是目前最大的多模态数据集，包含超过3.76亿个图像文本示例，覆盖了108种Wikipedia语言。其次，它是首个大规模的多语言数据集，每种语言都有超过12,000个示例，53种语言有超过10万个图像文本对。此外，WIT数据集还包含了一个多样化的概念和现实世界实体的集合，并提供了具有挑战性的现实世界测试集。

使用方法

使用WIT数据集的方法相对简单。数据集默认不下载图片，而是提供了图片的URL。要获取图片，可以使用Python的`urllib`库和`PIL`库来下载和打开图片。此外，可以使用`datasets`库来加载和操作数据集，并利用`map`函数对数据进行预处理，如图片下载等。在使用过程中，可以根据不同的任务需求，选择合适的数据字段和分割方式。

背景与挑战

背景概述

WIT数据集，全称为Wikipedia-based Image Text，是一个庞大的多模态多语言数据集。该数据集由谷歌研究团队创建，于2021年发布。它包含来自108种维基百科语言的3760万个实体丰富的图像-文本示例，以及1150万个独特的图像。WIT数据集的创建旨在解决现有数据集中语言覆盖不足的问题，并推动多语言多模态学习的研究。该数据集为机器学习模型提供了丰富的视觉和文本信息，使其能够更好地理解图像和文本之间的关系，并应用于图像字幕和文本检索等任务。

当前挑战

WIT数据集在构建过程中面临着一些挑战。首先，由于数据集包含多种语言，因此需要解决语言多样性的问题。其次，数据集的规模庞大，需要高效的数据处理和存储方案。此外，数据集中可能存在一些与特定文化或地区相关的偏见，需要通过数据清洗和平衡来减少这些偏见的影响。最后，数据集的更新和维护也需要持续进行，以保持其质量和相关性。

常用场景

经典使用场景

在多模态机器学习领域，WIT数据集被广泛用于模型预训练。其庞大的图像-文本实例集和多语言覆盖使其成为训练和评估多模态模型性能的理想选择。WIT数据集支持的任务包括图像字幕生成和文本检索，其中图像字幕生成任务尤为经典。研究者们利用WIT数据集中的丰富文本和图像实例，训练模型以自动生成与图像内容相符的描述性文本，这对于智能图像识别、自动报告生成等领域具有重要意义。

衍生相关工作

基于WIT数据集的研究衍生了许多经典工作。例如，一些研究者利用WIT数据集探索了多模态预训练模型的跨语言迁移学习能力，展示了WIT数据集在多语言多模态学习中的重要作用。此外，一些研究者利用WIT数据集研究了图像-文本对应关系的学习机制，为构建更精确的图像字幕生成模型提供了理论依据。

数据集最近研究

最新研究方向

WIT数据集作为目前最大的多模态多语言数据集，为跨语言和跨模态的机器学习研究提供了丰富的资源。该数据集在图像描述、文本检索等领域具有广泛的应用前景。在图像描述任务中，研究者可以利用WIT数据集训练模型，使其能够根据图像生成准确的文本描述。在文本检索任务中，WIT数据集可以帮助模型更好地理解图像内容，从而实现更准确的文本检索。此外，WIT数据集的多语言特性使得它在跨语言信息检索、机器翻译等领域也具有重要的研究价值。

以上内容由AI搜集并总结生成

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

CE-CSL

CE-CSL数据集是由哈尔滨工程大学智能科学与工程学院创建的中文连续手语数据集，旨在解决现有数据集在复杂环境下的局限性。该数据集包含5,988个从日常生活场景中收集的连续手语视频片段，涵盖超过70种不同的复杂背景，确保了数据集的代表性和泛化能力。数据集的创建过程严格遵循实际应用导向，通过收集大量真实场景下的手语视频材料，覆盖了广泛的情境变化和环境复杂性。CE-CSL数据集主要应用于连续手语识别领域，旨在提高手语识别技术在复杂环境中的准确性和效率，促进聋人与听人社区之间的无障碍沟通。

arXiv 收录

MagicData

MAGICDATA普通话阅读语音语料库由MAGIC DATA开发科技有限公司，并免费发布用于非商业用途。语料库的内容和相应的描述包括：语料库包含 755 小时的语音数据，即主要是移动记录的数据。来自中国不同口音地区的1080位发言者是受邀参与录制。句子转录准确率高于98%。录音在安静的室内环境中进行。数据库分为训练集、验证集和测试以51：1：2的比例设置。语音数据编码和说话人信息等详细信息是保留在元数据文件中。记录文本的领域是多样化的，包括交互式问答、音乐搜索、SNS消息、家庭命令和控制等。还提供了分段的成绩单。该语料库旨在支持语音识别，机器方面的研究人员翻译、说话人识别和其他语音相关领域。因此语料库完全免费供学术使用。

OpenDataLab 收录

flames-and-smoke-datasets

该仓库总结了多个公开的火焰和烟雾数据集，包括DFS、D-Fire dataset、FASDD、FLAME、BoWFire、VisiFire、fire-smoke-detect-yolov4、Forest Fire等数据集。每个数据集都有详细的描述，包括数据来源、图像数量、标注信息等。

github 收录

RAVDESS

情感语音和歌曲 (RAVDESS) 的Ryerson视听数据库包含7,356个文件 (总大小: 24.8 GB)。该数据库包含24位专业演员 (12位女性，12位男性)，以中性的北美口音发声两个词汇匹配的陈述。言语包括平静、快乐、悲伤、愤怒、恐惧、惊讶和厌恶的表情，歌曲则包含平静、快乐、悲伤、愤怒和恐惧的情绪。每个表达都是在两个情绪强度水平 (正常，强烈) 下产生的，另外还有一个中性表达。所有条件都有三种模态格式: 纯音频 (16位，48kHz .wav)，音频-视频 (720p H.264，AAC 48kHz，.mp4) 和仅视频 (无声音)。注意，Actor_18没有歌曲文件。

OpenDataLab 收录