The Movies Dataset

Name: The Movies Dataset
Creator: www.kaggle.com
Published: 2017-11-10 00:00:00
License: 暂无描述

www.kaggle.com2017-11-10 更新2025-03-25 收录

下载链接：

https://www.kaggle.com/rounakbanik/the-movies-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

### Context These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages. This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website. ### Content This dataset consists of the following files: **movies_metadata.csv:** The main Movies Metadata file. Contains information on 45,000 movies featured in the Full MovieLens dataset. Features include posters, backdrops, budget, revenue, release dates, languages, production countries and companies. **keywords.csv:** Contains the movie plot keywords for our MovieLens movies. Available in the form of a stringified JSON Object. **credits.csv:** Consists of Cast and Crew Information for all our movies. Available in the form of a stringified JSON Object. **links.csv:** The file that contains the TMDB and IMDB IDs of all the movies featured in the Full MovieLens dataset. **links_small.csv:** Contains the TMDB and IMDB IDs of a small subset of 9,000 movies of the Full Dataset. **ratings_small.csv:** The subset of 100,000 ratings from 700 users on 9,000 movies. The Full MovieLens Dataset consisting of 26 million ratings and 750,000 tag applications from 270,000 users on all the 45,000 movies in this dataset can be accessed [here](https://grouplens.org/datasets/movielens/latest/) ### Acknowledgements This dataset is an ensemble of data collected from TMDB and GroupLens. The Movie Details, Credits and Keywords have been collected from the TMDB Open API. This product uses the TMDb API but is not endorsed or certified by TMDb. Their API also provides access to data on many additional movies, actors and actresses, crew members, and TV shows. You can try it for yourself [here](https://www.themoviedb.org/documentation/api). The Movie Links and Ratings have been obtained from the Official GroupLens website. The files are a part of the dataset available [here](https://grouplens.org/datasets/movielens/latest/) ![](https://www.themoviedb.org/assets/static_cache/9b3f9c24d9fd5f297ae433eb33d93514/images/v4/logos/408x161-powered-by-rectangle-green.png) ### Inspiration This dataset was assembled as part of my second Capstone Project for Springboard's [Data Science Career Track](https://www.springboard.com/workshops/data-science-career-track). I wanted to perform an extensive EDA on Movie Data to narrate the history and the story of Cinema and use this metadata in combination with MovieLens ratings to build various types of Recommender Systems. Both my notebooks are available as kernels with this dataset: [The Story of Film](https://www.kaggle.com/rounakbanik/the-story-of-film) and [Movie Recommender Systems](https://www.kaggle.com/rounakbanik/movie-recommender-systems) Some of the things you can do with this dataset: Predicting movie revenue and/or movie success based on a certain metric. What movies tend to get higher vote counts and vote averages on TMDB? Building Content Based and Collaborative Filtering Based Recommendation Engines.

### 背景信息本数据集包含《Full MovieLens 数据集》中所有45,000部电影的相关元数据。该数据集涵盖了至2017年7月之前上映的电影。数据点包括演员阵容、制作人员信息、剧情关键词、预算、收入、海报、上映日期、语言、制作公司、国家、TMDB评分次数及平均评分。此数据集还包括针对全部45,000部电影，由270,000位用户提供的2,600万条评分数据。评分范围在1至5之间，并已从官方GroupLens网站获取。 ### 数据内容本数据集包含以下文件： **movies_metadata.csv:** 作为主要的电影元数据文件，包含Full MovieLens数据集中45,000部电影的详细信息，包括海报、背景图、预算、收入、上映日期、语言、制作国家及公司。 **keywords.csv:** 包含MovieLens电影的剧情关键词，以字符串化的JSON对象形式提供。 **credits.csv:** 包含所有电影的演员阵容和制作人员信息，以字符串化的JSON对象形式提供。 **links.csv:** 包含Full MovieLens数据集中所有电影的TMDB和IMDB ID。 **links_small.csv:** 包含Full Dataset中9,000部电影的TMDB和IMDB ID。 **ratings_small.csv:** 包含700位用户对9,000部电影进行的10万条评分数据的子集。完整的Full MovieLens数据集，包括2,600万条评分和750万条标签应用，由270,000位用户对数据集中的全部45,000部电影进行评分，可通过以下链接访问：[此处](https://grouplens.org/datasets/movielens/latest/) ### 致谢本数据集是收集自TMDB和GroupLens的数据集的集合。电影详情、评分和关键词均来自TMDB Open API。本产品使用TMDb API，但未经TMDb的认可或认证。TMDb API还提供了对许多其他电影、演员、女演员、制作人员和电视剧的数据访问。您可以在[此处](https://www.themoviedb.org/documentation/api)自行尝试。电影链接和评分数据来自官方GroupLens网站。文件是数据集的一部分，可通过以下链接访问：[此处](https://grouplens.org/datasets/movielens/latest/) ![](https://www.themoviedb.org/assets/static_cache/9b3f9c24d9fd5f297ae433eb33d93514/images/v4/logos/408x161-powered-by-rectangle-green.png) ### 激发灵感本数据集是作为我参加Springboard数据科学职业路径的第二次Capstone项目的组成部分而汇编的。[数据科学职业路径](https://www.springboard.com/workshops/data-science-career-track)。我希望建立一个广泛的电影数据分析，以讲述电影史和故事，并将这些元数据与MovieLens评分结合，构建多种类型的推荐系统。我的两个notebook均以核的形式与数据集一起提供：[电影故事](https://www.kaggle.com/rounakbanik/the-story-of-film)和[电影推荐系统](https://www.kaggle.com/rounakbanik/movie-recommender-systems)。您可以利用此数据集进行以下操作：根据特定指标预测电影收入和/或电影成功。哪些电影在TMDB上倾向于获得更高的评分次数和平均评分？构建基于内容的推荐引擎和基于协同过滤的推荐引擎。

提供机构：

www.kaggle.com

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集