five

A categorized multimodal TikTok dataset

收藏
DataCite Commons2024-10-02 更新2026-05-07 收录
下载链接:
https://www.weizenbaum-library.de/handle/id/420
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset encompasses 11242 entries of 5137 unique videos listed between the 31st of July and the 4th of August on the TikTok explore page (https://www.tiktok.com/explore). The page was accessed via a German IP address without being logged in. The data has been collected via the 4CAT Toolkit and the Zeeschuimer browser extension. The dataset contains the category and multimodal embeddings for each video. Intended Purpose: The dataset is primarily intended for proof-of-concept studies, as a toy dataset to teach or to be used for seminar papers by students. Given the lack of a clear definition for each category by TikTok, the focus of such work might be to explore those definitions or to conduct work with a focus on methods. The multimodal embeddings allow for directly applying unsupervised and supervised machine learning techniques. Contents: The dataset consists of four zipped .csv files: metadata.zip, text_embeddings.zip, audio_embeddings.zip, video_embedding.zip. For further details, please consult the Data Report (datenbericht_v2.pdf).
提供机构:
Weizenbaum Institute
创建时间:
2024-10-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作