HuNeBR: A Dataset of Annotated Humorous Transcriptions from YouTube Shorts by Northeastern Brazilian Comedians
收藏DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15473224
下载链接
链接失效反馈官方服务:
资源简介:
HuNeBR is a dataset containing 475 humor-based texts transcribed from YouTube Shorts featuring comedians from Brazil’s Northeast, sourced from videos published between April 10, 2022, and September 9, 2024. The material was compiled by selecting well-known regional comedians, identified through their media exposure and public acclaim. YouTube Shorts were chosen as the source due to their concise format, which facilitates quicker processing. Transcriptions were initially produced using automated tools and subsequently refined through manual editing. Each entry includes metadata such as the performance context (e.g., podcast, stand-up), the comedian's state of origin, notable cultural references, punchlines, and a multi-label classification across eight humor styles (including fun, benevolent humor, nonsense, wit, irony, sarcasm, satire, and cynicism). Additionally, each text is accompanied by an in-depth explanation of its comedic elements. The annotation process adhered to a thorough, multi-phase protocol grounded in established academic frameworks. It involved a lead annotator and six independent reviewers working across three stages: initial annotation, crossed review, and final adjustments based on collective input. This process took place over three months (January–March 2025), ensuring high levels of precision and consistency through structured cross-checking and expert evaluation. The final dataset is presented in a structured CSV format with 17 columns, providing a robust foundation for linguistic, sociocultural, and computational studies of humor in Brazilian Portuguese.
The folders listed below correspond to the data collection, annotation, and review phases. The final folder contains the main dataset (brazilian_ne_annotated_humorous_texts.csv), along with a PDF document that describes the columns present in all stages.
提供机构:
Zenodo
创建时间:
2025-05-27



