cryptexcode/MPST
收藏数据集概述
数据集名称
MPST: A Corpus of Movie Plot Synopses with Tags
数据集内容
- 包含约14,000部电影的剧情概要及其相关标签。
- 标签集合精细,涵盖电影剧情的多种异质特性。
- 支持多标签关联分析,用于探索电影与标签之间的相关性及情感流程。
数据集用途
- 用于构建自动电影标签生成系统,辅助推荐引擎改进电影检索。
- 帮助观众预先了解电影内容。
- 用于分析叙事文本,探索从剧情概要中推断标签的可行性。
数据集发布
- 首次发布于2018年LREC会议,地点为日本宫崎。
- 后续在2020年EMNLP会议上进行了用户评论的丰富。
关键词
- 电影标签生成
- 电影剧情分析
- 多标签数据集
- 叙事文本分析
引用信息
@InProceedings{KAR18.332, author = {Sudipta Kar and Suraj Maharjan and A. Pastor López-Monroy and Thamar Solorio}, title = {{MPST}: A Corpus of Movie Plot Synopses with Tags}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {May}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-00-9}, language = {english} }
@inproceedings{kar-etal-2020-multi, title = "Multi-view Story Characterization from Movie Plot Synopses and Reviews", author = "Kar, Sudipta and Aguilar, Gustavo and Lapata, Mirella and Solorio, Thamar", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.emnlp-main.454", doi = "10.18653/v1/2020.emnlp-main.454", pages = "5629--5646", abstract = "This paper considers the problem of characterizing stories by inferring properties such as theme and style using written synopses and reviews of movies. We experiment with a multi-label dataset of movie synopses and a tagset representing various attributes of stories (e.g., genre, type of events). Our proposed multi-view model encodes the synopses and reviews using hierarchical attention and shows improvement over methods that only use synopses. Finally, we demonstrate how we can take advantage of such a model to extract a complementary set of story-attributes from reviews without direct supervision. We have made our dataset and source code publicly available at https://ritual.uh.edu/multiview-tag-2020.", }



