five

TweetAMovie Dataset

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/n46hbnzhpy
下载链接
链接失效反馈
官方服务:
资源简介:
TweetAMovie is a new dataset of movie ratings and temporal popularity from IMDb and Twitter. This dataset contains 6015 average ratings ranging from 1 to 9.4 on 6041 movies across 23 genres. Each movie belongs to at least one genre and at most three genres. We collected the most popular movie list for six weeks. In total, TweetAMovie includes 368726 English tweets composed of 4370 well-structured tweets and 364356 unstructured tweets. Our dataset comprises three files: MovieReg.csv, MovieTPop.csv, and Tweets.csv, which store movie regular features, the temporal popularity of movies, and tweets-related features. We adopted an IMDb identifier as a movie id to facilitate additional metadata enrichment. ** This dataset is associated with the research article: Alhijawi, Bushra, and Arafat Awajan. "Prediction of movie success using Twitter temporal mining." Proceedings of Sixth International Congress on Information and Communication Technology: ICICT 2021, London, Volume 1. Singapore: Springer Singapore, 2021. ** MovieReg.csv is collected from IMDb on 2/12/2019. This file contains information about movies, including Movie Id (tconst), Movie Title (primaryTitle), Release Year (releaseYear), Average Rating (rating), Number of Votes (numVotes), and Genres (genre). ** MovieTPop.csv stores the Top-100 popular movie data. We collected the Top-100 popular movie list, which is updated weekly from 2/12/2019 to 13/1/2020. Each row in this file, after the header row, represents one movie and has the following format: Movie Id (tconst), Top-100 Movie list on 2/12/2019 (p1), Top-100 Movie list on 16/12/2019 (p2), Top-100 Movie list on 23/12/2019 (p3), Top-100 Movie list on 30/12/2019 (p4)., Top-100 Movie list on 6/1/2020 (p5), and Top-100 Movie list on 13/1/2020 (p6). ** The data stored in Tweets.csv are collected from Twitter. TweetAMovie includes well-structured and unstructured tweets. The structured tweets were obtained by querying a series of regular expressions, including "I rated" and hashtag "#IMDb" (e.g., "I rated Fast & Furious Presents: Hobbs & Shaw (2019) 8/10 #IMDb"). The unstructured tweets are extracted by querying the movie title. We faced a challenge when the movie title contained words that could be interpreted in multiple ways, leading to unrelated tweets being retrieved. For example, for the movie "They Live", we obtained the tweet "they said this? IN 2020???? what universe do they live in???", which is unrelated to the movie. Therefore, we validated the tweet using IBM Watson Natural Language Understanding (IBM-W-NLU) service to extract its category, ensuring that the TweetAMovie dataset includes only movie-related tweets.
创建时间:
2026-02-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作