GET-Tok
收藏arXiv2024-02-09 更新2024-06-21 收录
下载链接:
https://github.com/gabbypinto/GET-Tok-Peru-data
下载链接
链接失效反馈官方服务:
资源简介:
GET-Tok数据集是由南加州大学创建,专注于记录2022年秘鲁政变事件的TikTok视频。该数据集包含43,697条视频,覆盖了从2022年11月20日至2023年3月1日的关键政治事件期间。数据集通过结合TikTok研究API与生成AI模型,增强了视频的语音转录、文本描述和立场表达。创建过程中,使用Whisper和GPT-4等AI技术生成视频描述和转录,旨在深入分析非英语社交媒体内容,特别是在政治危机背景下的在线讨论。该数据集的应用领域包括多语言环境下的内容分析、多模态分析,以及探讨社交媒体对现实世界事件的影响。
The GET-Tok dataset was developed by the University of Southern California, focusing on documenting TikTok videos related to the 2022 Peruvian coup d'état. This dataset comprises 43,697 videos, spanning the period of key political events from November 20, 2022 to March 1, 2023. The dataset enhances speech transcription, text descriptions and stance expressions of the videos by integrating the TikTok Research API with generative AI models. During its development, AI technologies such as Whisper and GPT-4 were employed to generate video descriptions and transcriptions, aiming to conduct in-depth analysis of non-English social media content, especially online discussions in the context of political crises. The application domains of this dataset include content analysis in multilingual settings, multimodal analysis, and exploring the impact of social media on real-world events.
提供机构:
南加州大学
创建时间:
2024-02-09



