five

cryptexcode/MPST

收藏
Hugging Face2022-09-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cryptexcode/MPST
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含电影情节摘要和标签,用于电影标签生成和情节分析。数据集最初在LREC 2018会议上发布,后来在EMNLP 2020会议上增加了用户评论数据。数据集的目标是帮助自动生成电影标签,改进推荐系统,并帮助观众提前了解电影内容。数据集包含约70个精细标签,这些标签揭示了电影情节的异质特征,并与约14K个电影情节摘要相关联。
提供机构:
cryptexcode
原始信息汇总

数据集概述

数据集名称

MPST: A Corpus of Movie Plot Synopses with Tags

数据集内容

  • 包含约14,000部电影的剧情概要及其相关标签。
  • 标签集合精细,涵盖电影剧情的多种异质特性。
  • 支持多标签关联分析,用于探索电影与标签之间的相关性及情感流程。

数据集用途

  • 用于构建自动电影标签生成系统,辅助推荐引擎改进电影检索。
  • 帮助观众预先了解电影内容。
  • 用于分析叙事文本,探索从剧情概要中推断标签的可行性。

数据集发布

  • 首次发布于2018年LREC会议,地点为日本宫崎。
  • 后续在2020年EMNLP会议上进行了用户评论的丰富。

关键词

  • 电影标签生成
  • 电影剧情分析
  • 多标签数据集
  • 叙事文本分析

引用信息

@InProceedings{KAR18.332, author = {Sudipta Kar and Suraj Maharjan and A. Pastor López-Monroy and Thamar Solorio}, title = {{MPST}: A Corpus of Movie Plot Synopses with Tags}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {May}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-00-9}, language = {english} }

@inproceedings{kar-etal-2020-multi, title = "Multi-view Story Characterization from Movie Plot Synopses and Reviews", author = "Kar, Sudipta and Aguilar, Gustavo and Lapata, Mirella and Solorio, Thamar", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.emnlp-main.454", doi = "10.18653/v1/2020.emnlp-main.454", pages = "5629--5646", abstract = "This paper considers the problem of characterizing stories by inferring properties such as theme and style using written synopses and reviews of movies. We experiment with a multi-label dataset of movie synopses and a tagset representing various attributes of stories (e.g., genre, type of events). Our proposed multi-view model encodes the synopses and reviews using hierarchical attention and shows improvement over methods that only use synopses. Finally, we demonstrate how we can take advantage of such a model to extract a complementary set of story-attributes from reviews without direct supervision. We have made our dataset and source code publicly available at https://ritual.uh.edu/multiview-tag-2020.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作