



EDIT: forgot to mark it as [OC] so this is a repost
I wanted to test out and expand my R skills so I downloaded this dataset from kaggle (https://www.kaggle.com/datasets/ggtejas/tmdb-imdb-merged-movies-dataset) and used it to analyse how superhero films have changed and grown over time
first image: the number of superhero films through the years
pretty self explanatory and as expected
second image: the combined average rating from TMDb and IMDb (low, medium, or high, ranging from 0 – 10) of superhero films through the decades
has trended slightly upwards, backed up by a linear model showing a significant increase in average rating post-1990s
third image: factors affecting the profits of superhero films
all three were identified as significant in regards to profit via linear model
star average career profit is the average profit of the first listed (main) cast member prior to the current film, basically describing how expensive/popular/in demand the main actor is
fourth image: MDS plot taken from Random Forest model used to determine features that predict profit
the red dots indicates films that achieved a high profit (over median value) while blue dots are films that achieved a low profit (less than median value)
the labelled films are those that, according to the random forest model, had all the right variables for either a low or high profit, but for some reason achieved the opposite.
for example, mad max had features correlated to a low profit margin but did very well and got a high profit, likely due to things like low budget, short runtime, etc, while still performing well at the box office.
in comparison, the incredible hulk had features correlated to a high profit but did poorly and made a low profit.
this is purely a hobby project in an attempt to expand my skills so any feedback is greatly appreciated. I am working on getting a github repo for the code and also expanding the analysis to look at other niche genres and such so any suggestions are welcome!
by New-Software316
1 Comment
Kamen Rider Build has only 180 votes in IMDB andi t’s an outlier as if it means something. I feel like you could filter out some of these movies.
3rd and 4th are very unclear and I wouldn’t say they’re beautiful. no clue what half of it means. why would you even label them “dimension 1” and “dimension 2”