Research Methodology

In the last Section, we have collected the basic movie information from TMDb and cleaned it in specific rules. Also we collected the YouTube Views related to the cleaned movies. In this section, we firstly try to analyze correlationship between different features and ROI, an indicator of box office performance. Then we apply regression methods to predict ROI. After that, we will re-model our task into a classification problem, and train classifiers to predict Box Office performance. And we will compare different models to find a good prediction model.

ROI

ROI stands for Revenue of Investment, and the calculation of ROI is revenue divided by budget.

ROI is a very important concept in our project since it is useful to reflect the box office performance of a movie. The reason of not using revenue solely to reflect box office is that high revenue does not necessarily mean good performance (imagine a movie with 1 billion dollar revenue but 10 billion dollar budget).

basic_pie_chart