Indonesian Stand-Up Comedy Transcription Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/85xgdr7cc7
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains transcriptions of 3,934 Indonesian stand-up comedy videos sourced from Kompas TV’s YouTube channel. Each entry includes the video title, URL, raw transcript, cleaned transcript, and the number of laughter events. Transcripts were preprocessed by removing timestamps, non-verbal tags (e.g., [Tawa], [Musik]), and formatting inconsistencies to produce NLP-ready text. The dataset consists of over 2.8 million words and 17,394 audience laughter annotations. It enables research in humor detection, sentiment analysis, speech emotion recognition, and cultural discourse analysis. Data are stored in Excel and can be filtered by metadata such as performer, title, and laughter count. This resource is particularly valuable for researchers working with low-resource languages and spoken entertainment content in Indonesian.
创建时间:
2025-07-02



