MCSD 1.0 - Multimodal Chinese Sarcasm Dataset
收藏DataCite Commons2025-07-09 更新2026-04-25 收录
下载链接:
https://dataverse.nl/citation?persistentId=doi:10.34894/A0NLTQ
下载链接
链接失效反馈官方服务:
资源简介:
This repository includes full text file of Multimodal Chinese Sarcasm Dataset (MCSD), a curated dataset for research on multimodal sarcasm detection in Mandarin Chinese publicly broadcasted stand-up comedy. The corpus is structured as follows:<br><br>
<ul>
<li>unique utterance ID for each transcribed segment.</li>
<li>manually verified transcription of the spoken utterance (in Mandarin).</li>
<li>pseudonymized speaker ID.</li>
<li>annotated label (sarcastic / not sarcastic) for each transcription.</li>
<li>aligned start and end timestamps.</li>
<li>reference to the original publicly available video.</li>
</ul>
For full <em>dataset description and annotation guidelines</em>, please see: <a href="https://github.com/x-y-g/MCSD/wiki">Link</a><br><br>
<h4>Contributors and roles</h4>
<ul>
<li>Xiyuan Gao (University of Groningen) – PhD researcher. Responsible for dataset design, transcription processing, annotation guideline.</li>
<li>Dr. Bruce Xiao Wang (Hong Kong Polytechnic University) – Collaborator and linguistic expert. Contributed to the research framework, research methodology design, and Mandarin discourse insights.</li>
<li>Meiling Zhang, Shuming Zhang, and Zhu Li – Carried out manual labeling of sarcasm in the transcribed data based on developed annotation protocols.</li>
<li>Dr. Matt Coler & Dr. Shekhar Nayak (University of Groningen) – Supervisors. Provided research supervision and guidance on ethical compliance.</li>
提供机构:
DataverseNL
创建时间:
2025-06-06



