five

eduhk-compling/Annotated_Food_Vlog_Dataset_GroupL

收藏
Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/eduhk-compling/Annotated_Food_Vlog_Dataset_GroupL
下载链接
链接失效反馈
官方服务:
资源简介:
该项目构建了一个围绕中国社交媒体上美食探索视频语言策略的多模态语料库。数据集以知名博主Diao Yueshe Shi Yu Ji的视频为核心,包含约1,000条条目,总计90分钟的转录视频内容,存储为CSV格式。数据集详细记录了原始对话、大型语言模型(LLM)生成的合成文本、修辞策略标签(如悬念、焦虑构建、仪式补偿)和感官描述词。在构建过程中,团队使用了Whisper等工具进行自动转录,并辅以人工校对,确保标注一致性达到85%。数据集采用对比抽样策略,将内容分为高流量和普通两组,旨在揭示不同话语模式如何影响观众参与和消费心理。尽管聚焦于单一顶级博主,但其语言风格被视为行业标杆,具有高度代表性。该资源不仅支持自然语言处理(NLP)中的意图识别任务,还可应用于数字营销、社会学和语料库语言学等跨学科研究,有助于理解数字时代说服性话语对消费者行为的塑造作用。

This project has constructed a multimodal corpus of language strategies for food exploration videos on Chinese social media. The dataset is centered around the videos of the well-known blogger Diao Yueshe Shi Yu Ji, containing approximately 1,000 entries with a total of 90 minutes of transcribed video content. The dataset is stored in CSV format and meticulously records the original dialogue, synthetic text generated by large language models (LLMs), rhetorical strategy labels (such as suspense, anxiety construction, ritual compensation), and sensory description words. During the construction process, the team utilized tools like Whisper for automatic transcription and supplemented with manual proofreading to ensure a consistency of 85% in the annotations. The dataset adopts a contrastive sampling strategy, dividing the content into high-traffic and ordinary groups, aiming to reveal how different discourse patterns influence audience engagement and consumer psychology. Although focused on a single top blogger, their language style is regarded as an industry benchmark and highly representative. This resource not only supports intent recognition tasks in natural language processing (NLP) but also can be applied to interdisciplinary research in digital marketing, sociology, and corpus linguistics, facilitating an understanding of the shaping effect of persuasive discourse on consumer behavior in the digital age.
提供机构:
eduhk-compling
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作