Blockbusting the Box Office
A Machine Learning Approach to Movie Revenue Prediction
Abstract
Using the machine learning model XGBoost Classifier, we were able to determine with 71.95% accuracy whether or not a given movie will make money in the box office prior to its release date. Through feature engineering, the model created was able to take into account the popularity and star power of directors and actors, as well as factoring in the current trend of movie franchises. The top three important features from the model are the number of theaters, the production budget, and whether or not a movie is part of a franchise. The model used placed importance not on the absolute revenues generated by each movie, but on the return on investment as a measure for the efficiency and profitability of the investment from studios. The results indicate that there is an opportunity for movie studios to focus on producing low/mid-budget films that have a high return on investment as opposed to traditionally focusing on high budget films that carry more investment risk.
Introduction
The film industry is undoubtedly one of the most successful industries of the 21st century as it has generated billions of dollars in box office revenue or movie ticket sales for decades. Just in North America alone, over 1.3 billion tickets were sold last year that resulted in earnings of at least $11.8 billion U.S. dollars [1]. Although today, the rise of streaming platforms like Netflix and YouTube has given the film industry even more competition in the entertainment field. This just gives more importance and emphasis in choosing the right movie that producers, studios, and stakeholders should invest in. Given the fact that producing a movie can amount to hundreds of millions of dollars, a decision support tool will greatly help producers or directors make data-driven decisions in making a movie in order to increase its chance of being a box office success.
In this study, multiple machine learning algorithms are used and compared to predict the performance of a movie such as its profitability, specifically its measure of return on investment or ROI, which are classified into three categories: (1) ROI < 0, meaning profit was less than the investment or production budget, hence, lost money; (2) 0 < ROI < 3, which means the movie made some money but is not considered a box office hit; or (3) ROI > 3, wherein the movie made more than triple its investment, thus, is considered a blockbuster movie. Details of the movie like the budget, actors, directors, etc., as well as if it’s a franchise or not, are used in creating the prediction model, hence, has determined the model’s top predictors which essentially tell us which factors or characteristics of a movie has greatly affected its success or failure.
If you wish to have a copy of the technical paper, data, or the code used in this project, kindly contact us via e-mail or LinkedIn.