I wanted to test out and expand my R skills so I downloaded this dataset from kaggle (https://www.kaggle.com/datasets/ggtejas/tmdb-imdb-merged-movies-dataset) and used it to analyse how superhero films have changed and grown over time

    first image: the number of superhero films through the years

    pretty self explanatory and as expected

    second image: the combined average rating from TMDb and IMDb (low, medium, or high, ranging from 0 – 10) of superhero films through the decades

    has trended slightly upwards, backed up by a linear model showing a significant increase in average rating post-1990s

    third image: factors affecting the profits of superhero films

    all three were identified as significant in regards to profit via linear model

    star average career profit is the average profit of the first listed (main) cast member prior to the current film, basically describing how expensive/popular/in demand the main actor is

    fourth image: MDS plot taken from Random Forest model used to determine features that predict profit

    the red dots indicates films that achieved a high profit (over median value) while blue dots are films that achieved a low profit (less than median value)

    the labelled films are those that, according to the random forest model, had all the right variables for either a low or high profit, but for some reason achieved the opposite.

    for example, mad max had features correlated to a low profit margin but did very well and got a high profit, likely due to things like low budget, short runtime, etc, while still performing well at the box office.

    in comparison, the incredible hulk had features correlated to a high profit but did poorly and made a low profit.

    this is purely a hobby project in an attempt to expand my skills so any feedback is greatly appreciated. I am working on getting a github repo for the code and also expanding the analysis to look at other niche genres and such so any suggestions are welcome!

    by New-Software316

    Share.

    4 Comments

    1. Organized-Konfusion on

      Can you compare westerns and superhero movies chart side by side?

    2. In the fourth image, it compressed the space down to two dimensions. (1) do you have any intuition for what those dimensions are? (2) I’d love to see a one-dimensional or three-dimensional predictor

    Leave A Reply