GENEA Challenge 2023 Dataset Files
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8199132
下载链接
链接失效反馈官方服务:
资源简介:
This Zenodo repository contains the main dataset for the GENEA Challenge 2023, which is based on the Talking With Hands 16.2M dataset.
Notation:
Please take note of the following nomenclature when reading this document:
main agent refers to the speaker in the dyadic interaction for which the systems generated motions.
interlocutor refers to the speaker in front of the main agent.
Contents:
The “genea2023_trn" and "genea2023_val" zip files contain audio files (in WAV format), time-aligned transcriptions (in TSV format), and motion files (in BVH format) for the training and validation datasets, respectively.
The "genea2023_test" zip file contains audio files (in WAV format) and transcriptions (in TSV format) for the test set, but no motion. The corresponding test motion is available at:
https://zenodo.org/record/8146027
Each zip file also contains a "metadata.csv" file that contains information for all files regarding the speaker ID and whether or not the motion files contain finger motion.
Note that the speech audio in the data sometimes has been replaced by silence for the purpose of anonymisation.
In the test set, files with indices from 0 to 40 correspond to "matched" interactions (the core test set), where main agent and interlocutor data come from the same conversation, whilst file indices from 41 to 69 correspond to "mismatched" interactions (the extended test set), where main agent and interlocutor data come from different conversations.
Folder structure:
main-agent/ (main agent): Encapsulates BVH, TSV, WAV data subfolders for the main agent.
interloctr/ (interlocutor): Encapsulates BVH, TSV, WAV data subfolders for the interlocutor.
bvh/ (motion): Time-aligned 3D full-body motion-capture data in BVH format from a speaking and gesticulating actor. Each file is a single person, but each data sample contains files for both the main agent and the interlocutor.
wav/ (audio): Recorded audio data in WAV format from a speaking and gesticulating actor with a close-talking microphone. Parts of the audio recordings have been muted to omit personally identifiable information.
tsv/ (text): Word-level time-aligned text transcriptions of the above audio recordings in TSV format (tab-separated values). For privacy reasons, the transcriptions do not include references to personally identifiable information, similar to the audio files.
Data processing scripts:
We provide a number of optional scripts for encoding and processing the challenge data:
Audio: Scripts for extracting basic audio features, such as spectrograms, prosodic features, and mel-frequency cepstral coefficients (MFCCs) can be found at this link.
Text: A script to encode text transcriptions to word vectors using FastText is available here: tsv2wordvectors.py
Motion: If you wish to encode the joint angles from the BVH files to and from an exponential map representation, you can use scripts by Simon Alexanderson based on the PyMo library, which are available here:
bvh2features.py
features2bvh.py
Attribution:
If you use this material, please cite our latest paper on the GENEA Challenge 2023. At the time of writing (2023-07-25) this is our ACM ICMI 2023 paper:
Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, and Gustav Eje Henter. 2023. The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI ’23). ACM.
Also, please cite the paper about the original dataset from Meta Research:
Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S. Srinivasa, and Yaser Sheikh. 2019. Talking With Hands 16.2M: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV ’19). IEEE, 763–772.
The motion and audio files are based on the Talking With Hands 16.2M dataset at https://github.com/facebookresearch/TalkingWithHands32M/. The material is available under a CC BY NC 4.0 Attribution-NonCommercial 4.0 International license, with the text provided in LICENSE.txt.
To find more GENEA Challenge 2023 material on the web, please see:
https://genea-workshop.github.io/2023/challenge/
If you have any questions or comments, please contact:
The GENEA Challenge organisers
本Zenodo仓库包含GENEA挑战赛2023的核心数据集,该数据集基于Talking With Hands 16.2M数据集。
### 术语说明
阅读本文档时请注意以下命名规范:
- 主发言者(main agent):指系统为其生成动作的双人交互中的说话者。
- 对话者(interlocutor):指位于主发言者对面的说话者。
### 数据集内容
`genea2023_trn`与`genea2023_val`压缩包分别包含训练集与验证集的音频文件(WAV格式)、时间对齐转录文本(TSV格式)以及动作文件(BVH格式)。
`genea2023_test`压缩包包含测试集的音频文件(WAV格式)与转录文本(TSV格式),但未附带动作数据。对应的测试动作数据可从以下链接获取:https://zenodo.org/record/8146027
每个压缩包均包含一个`metadata.csv`文件,其中记录了所有文件的发言者ID,以及动作文件是否包含手指动作信息。
请注意,为实现匿名化,数据中的部分语音音频已被替换为静音。
在测试集中,索引0至40的文件对应「匹配」交互(核心测试集),此时主发言者与对话者的数据来自同一场对话;而索引41至69的文件对应「不匹配」交互(扩展测试集),此时主发言者与对话者的数据来自不同对话。
### 文件夹结构
- `main-agent/`(主发言者目录):封装主发言者的BVH、TSV、WAV数据子文件夹。
- `interloctr/`(对话者目录,原文笔误应为`interlocutor`):封装对话者的BVH、TSV、WAV数据子文件夹。
- `bvh/`(动作):来自伴有手势的讲话演员的时间对齐三维全身动作捕捉数据,格式为BVH。每个文件对应单个人物,但每个数据样本同时包含主发言者与对话者的文件。
- `wav/`(音频):使用近距通话麦克风录制的伴有手势的讲话演员的音频数据,格式为WAV。部分音频录制内容已被静音,以删除个人可识别信息。
- `tsv/`(文本):上述音频录制内容的词级时间对齐文本转录,格式为TSV(制表符分隔值)。出于隐私保护考虑,与音频文件类似,转录文本中未包含任何指向个人可识别信息的内容。
### 数据处理脚本
我们提供了若干可选脚本,用于对挑战赛数据进行编码与处理:
1. **音频**:可提取基础音频特征(包括频谱图、韵律特征以及梅尔频率倒谱系数(MFCCs))的脚本可从以下链接获取。
2. **文本**:可使用`tsv2wordvectors.py`脚本,通过FastText将文本转录编码为词向量。
3. **动作**:若希望将BVH文件中的关节角度编码为指数映射表示,或从该表示还原为BVH格式,可使用Simon Alexanderson基于PyMo库开发的脚本,相关文件如下:
- `bvh2features.py`
- `features2bvh.py`
### 引用规范
若使用本数据集,请引用我们关于GENEA挑战赛2023的最新论文。在本文档撰写之时(2023年7月25日),对应的论文为发表于ACM ICMI 2023的工作:
> Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, 以及 Gustav Eje Henter. 2023. The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings. 见《ACM国际多模态交互会议(ICMI ’23)论文集》。ACM出版社。
同时,请引用Meta Research关于原始数据集的论文:
> Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S. Srinivasa, 以及 Yaser Sheikh. 2019. Talking With Hands 16.2M: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. 见《IEEE/CVF国际计算机视觉大会(ICCV ’19)论文集》。IEEE出版社,763–772页。
本数据集的动作与音频文件基于https://github.com/facebookresearch/TalkingWithHands32M/ 上的Talking With Hands 16.2M数据集。本材料采用CC BY NC 4.0 署名-非商业使用4.0国际许可协议发布,详细文本可参见LICENSE.txt文件。
若需查找更多GENEA挑战赛2023的相关材料,请访问:https://genea-workshop.github.io/2023/challenge/
如有任何疑问或建议,请联系:GENEA挑战赛组委会
创建时间:
2023-07-31



