MCSD 1.0 - Multimodal Chinese Sarcasm Dataset

Name: MCSD 1.0 - Multimodal Chinese Sarcasm Dataset
Creator: DataverseNL
Published: 2025-07-09 10:10:11
License: 暂无描述

DataCite Commons2025-07-09 更新2026-04-25 收录

下载链接：

https://dataverse.nl/citation?persistentId=doi:10.34894/A0NLTQ

下载链接

链接失效反馈

官方服务：

资源简介：

This repository includes full text file of Multimodal Chinese Sarcasm Dataset (MCSD), a curated dataset for research on multimodal sarcasm detection in Mandarin Chinese publicly broadcasted stand-up comedy. The corpus is structured as follows:<br><br> <ul> <li>unique utterance ID for each transcribed segment.</li> <li>manually verified transcription of the spoken utterance (in Mandarin).</li> <li>pseudonymized speaker ID.</li> <li>annotated label (sarcastic / not sarcastic) for each transcription.</li> <li>aligned start and end timestamps.</li> <li>reference to the original publicly available video.</li> </ul> For full <em>dataset description and annotation guidelines</em>, please see: <a href="https://github.com/x-y-g/MCSD/wiki">Link</a><br><br> <h4>Contributors and roles</h4> <ul> <li>Xiyuan Gao (University of Groningen) – PhD researcher. Responsible for dataset design, transcription processing, annotation guideline.</li> <li>Dr. Bruce Xiao Wang (Hong Kong Polytechnic University) – Collaborator and linguistic expert. Contributed to the research framework, research methodology design, and Mandarin discourse insights.</li> <li>Meiling Zhang, Shuming Zhang, and Zhu Li – Carried out manual labeling of sarcasm in the transcribed data based on developed annotation protocols.</li> <li>Dr. Matt Coler & Dr. Shekhar Nayak (University of Groningen) – Supervisors. Provided research supervision and guidance on ethical compliance.</li>

提供机构：

DataverseNL

创建时间：

2025-06-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集