Home  ||  Blog  ||  Projects  ||  Google Hacks  ||  Artificial Life  ||  Search  ||  About

Movie Rater

One thing that never ceases to amaze me is how many bad movies are made. Well, being made is one thing, but they are also put in theaters and being promoted heavily. I think it is odd because in most cases you can tell pretty easily whether something is a winner or a complete dud.

 

If I go to the video store and pick out a movie, I pay attention to the actors, the writer and the director. Partly this is because I think they make the movie, but they also do filtering. Good actors will try to avoid playing in bad movies. There are exceptions of course, but you can trust some actors and even more so directors.

 

I was thinking, if this works, I can write a program to do the same. The IMDB of course has a nice list of movies, actors and all, complete with ratings and who played in what and I was about to write a scraper, when I discovered that they actually publish the stuff to download: http://www.imdb.com/interfaces

 

So I downloaded the files and wrote three python scripts, see below. In order to use them, you’ll have to download actors.list, actresses.list, directors.list, ratings.list and writers.list or you can believe me. filterMovies.py extracts a list of ratings for movies that have enough data. filterActors.py produces a file of all actors, writers and directors that have played in enough movies to be relevant. rateActors.py does the hard work and produces a list of actors sorted by their impact on the movie.

 

It doesn’t mean necessarily that actors with a higher score are better, but the higher the score, the more likely a movie that they play in will be good. How well does it work? Surprisingly well. On average the difference between the prediction and the actual score is on average .65 points, which seems really good enough to distinguish between a winner and a loser (sucky movies on IMDB are around 4, winners are higher than 7).

 

It works pretty simple. We predict the score of a movie by assuming it is sum of the scores of the actors times the weight of the actor. As the wait of the actor I use the number of times an actor appears in a movie. This could surely be improved on, but it gives us something. For the score, I start with the average score of the movies the actor has appeared in. We then iterate and for each iteration determine what the current set of scores predicts for our movies. If a movie prediction is too low, we increase the scores of the actors in them, otherwise we decrease them. It converges.

 

So what does it do? Well, Madonna is the worst actress in the list. Spike Lee is the worst writer. That might seem strange, but if you look into it, it makes sense. See, if a movie is directed by Spike Lee, it has a very good chance to score well (a score of 9.8). But if you look at the movies that Spike Lee directed and that didn’t do well, you’ll see that most of the time he wrote the movie too. The movies he didn’t write did much better, so him writing a movie is a bad signal.

 

Richard Donner is the director indicating success and Tracy Reiner the best actress. Tracy doesn’t seem to play big roles, but maybe she reads them very well.

downloadable files

filterActors.pyfilterActors.py
filterMovies.pyfilterMovies.py
rateActors.pyrateActors.py
actors.rated.txtActors, directors etc sorted on their movie impact


comment_1
by


Post a comment
(c) Douwe Osinga 2001-2005, douwe.webfeedback@gmail.com Vertaling Nederlands Duits?