Hollywood Budgets – A $5000 Data-Viz Challenge

Tuesday, January 10th, 2012

Hollywood Budgets - A Visualization Challenge
This might be our best dataset yet. A massive sheaf of numbers on every major Hollywood film since 2007. Their budgets, review scores, grosses, genres and profits. Just in time for the Oscars in February!

We’re challenging people to visualise this data – either in a design or an interactive piece. And, thanks to beloved sponsors Kantar, we’ve got $5000 to give away to the winners.

Best of all – you don’t need to be able to design. You can sketch your entry on a napkin.

» Check out the challenge at InformationIsBeautifulAwards.com
» Check out the data

The Matrix

I’m so excited about this dataset. We hand-compiled it over a year. It’s fully comprehensive and flips up the hood on Hollywood. Let me just go through it with you.

For every Hollywood & major US film of the last 5 years, it collates:

  • lead studio
  • reviews – Rotten tomatoes metascore (all critics’ reviews combined into a single score) & audience score
  • story type (of the 22 potential types of plot – see this PDF for a summary)
  • genre
  • grosses – opening weekend, domestic gross, foreign gross, and worldwide (plus number of theatres in US opening weekend)
  • budget (very difficult figure to find for some movies, especially flops)

and best of all

profitability - what % of the production budget was recovered at the box office.


It always bugs me how Hollywood grades or broadcasts the success of a film by gross income. Profitability, or % of Budget Recovered, is a way better grade of a film’s success. Especially in America, where each film has such high printing and advertising costs, that it needs to recover about 250-300% of its budget to be deemed a true hit.

In fact, if you use Profitability as an index, it changes the view considerably. Take 2007, for example, where the biggest grossing film was Pirates Of The Caribbean: At Worlds End. But it only recovered 320% of its budget. But the most profitable film of 2007 by far was…

Can you guess? Have a look at the data.

CONCEPT: David McCandless
SOURCES: The-Numbers.com, BoxOfficeMojo, IMDB, Wikipedia
DATA GATHERERS: Miriam Quick, Marley Whiteside, Dan Hampson, Pearl Doughty-White, Matt Hancock, Alexia Wdowski
SPECIAL THANKS TO Phil Hodges (sorry it didn’t work out man!)

Books and Store

Our Beautiful Books - Information is Beautiful Information is Beautiful Store

Show Comments ( )

  • Nick

    Very cool! Looks like the profitability percentages on the 2008 and 2009 pages are actually just worldwide gross/budget though, missing the *100 or setting the type on the cell. One or two others scattered on other pages too.

  • http://clubneko.net/ nick

    2008 and 2009 % of BR are displaying in decimal and not % like the others. :)

  • http://mattischneider.fr Matti Schneider

    “Especially in America, where each film has such high printing and advertising costs, that it needs to recover about 250-300% of its budget to be deemed a true hit.”

    Does that mean printing and advertising costs are not included in its budget? :-S

  • Mythreyi

    Oh This sounds heaven sent!

    Can anybody inform me why there is a “Inspired by KANTAR” line. Was there a competition on similar lines by the group?

  • http://lab.zoho.co.uk Philip Hodges

    No worries!

    Thanks for the mention and sorry I didn’t have more time available to work on it. Looking forward to seeing the contributions.


  • http://alain-pilon.com Alain

    Super interesting. Seeing one of my favourite comedy from 2011, Your Highness, scoring so low, it confirms my initial perception that it was very badly marketed.

  • Jon

    Are the figures in real terms? And if so, who which year?

  • Tom

    Is there a way I can be notified if you edit or add to this dataset? That way I can copy it out to my own file for exploration without worrying about making a visualization with bad data. :)


  • gregthestopsign

    If you’re expecting the data to visualised, why not have a sane spreadsheet of data?

    * the data should all be on one sheet, with ‘Year’ being a column
    * there should only be one header line, so that it can be sanely exported to CSV.

  • http://sixteencolors.net Doug

    FYI, Make a Copy is not available to anonymous users, at least on this document. I had to copy each sheet individually.

    • http://sixteencolors.net Doug

      I take that back, when I try to copy a sheet I get: “We’re sorry, a server error occurred. Please wait a bit and try reloading your spreadsheet.”

  • http://vislives.com Chris Pudney

    What’s the copyright on the data set?

    Is it OK to republish it in your own Google Doc or a Many Eyes data set?