Hacker News new | comments | show | ask | jobs | submit login
A History of Film in Wikipedia (argteam.com)
47 points by jvrossb 1598 days ago | hide | past | web | 25 comments | favorite

It's important to remember that these are the films that have been influential on English-speaking Wikipedia editors. 75% of editors are under 30, 90% are male. 76% of edits are to the English Wikipedia. I couldn't find any numbers but I believe there's a similar bias towards white Americans, simply based on the demographics of the US, UK, Australia, NZ, and Canada.

Influential films for white male Americans under 30 that also choose to edit Wikipedia? Sure.

To be clear, I thought this was an interesting technique and visualization. I like that it's influence by year instead of the top 100 influential films of all time, with Citizen Kane at the top.

I wonder how similar or different the numbers would be if the sample was a perfect representation of the world population?

I don't know what the situation is like everywhere in the world, but having visited Europe since the 90's, it was pretty clear Hollywood dominated movies/music (and obviously here in Canada too). So while we might all have our specific picks that are influential to us (whether it's a thought-provoking European movie or an American epic movie), it may end up being not so different in the end? Though the billions in Asia may sway it :)

This is probably an interesting read for those who care: http://isites.harvard.edu/fs/docs/icb.topic152447.files/rose...

Good point - this is a view of the English speaking world as an aggregate. I'm hoping to re-run this experiment on non-US Wikipedia sites. If there are any interesting differences I'll try to put together a new visualization :)

Sounds cool. Non-Hollywood films would also be interesting.

True. For what it's worth, Wikipedia has project pages explicitly addressing this:



Even at best, though, an Internet project is inevitably going to be biased towards the POV of whoever uses the Internet. The solution is to expand Internet access, which quickly reduces to the problem of solving all of the social problems nobody's been able to solve yet. Piece of cake.

Just discovered (under Systemic bias) that the Dutch wikipedia is the second biggest. That's pretty awesome of us Dutchies! (I also read ones that the Dutch are 4th or something on torrent uploads. Brilliant.)

Thanks for pointing out those pages.

I think greater internet access would certainly help, but it's not a panacea as far as Wikipedia is concerned. It might be that no matter what happens, a specific majority demographic with respect to age, income, eduction, and gender will choose to edit an online NPOV collaborative encyclopedia.

Well done, in both data crunching and visualizing. I'd love to revisit this in 15 years and see what modern films held up. I doubt TDKR will last (even for being a decent movie). Avatar and TDK will probably continue to be notable films, as both were historic points in film.

The only one I was surprised to see on there was Juno. I've never seen it, is it really that notable?

Avatar and TDK historic points in film. AKA Dancing with Smurfs, and The Movie With More Plot Holes Than The Titanic. Hum, I beg to disagree.

Overall, I got a good laugh from the list. Seriously, this list should be called "things white US kids (of my age, granted) like". Rather than a history of cinema, it belongs more in the "comedy generated by computer" genre. Exhibit one: City Lights isn't there. Exhibit two: Modern Times isn't there. No Le Voyage dans la Lune, no Man With Camera, no Great Train Robbery, Rashomon isn't cool enough to make the list, Vertigo is too weird, and Battleship Potemkin is, like, too commie, you know. The final straws would be Independence Day and It's a Wonderful Life making the list.

I mean, On the Waterfront looks like it's very good (I haven't seen it), but, but, it was released in the same year than motherfucking Seven Samurai!

Seven Samurai is not an unambiguously better movie than On the Waterfront. I hate Elia Kazan as much as the next guy for what he did, but he was a historic talent directing historic talent, making snitching on the working man to the bosses seem positively heroic. Foreign things aren't always better.

You can make a guess for how things held up in the past by looking at some of the Google Books datasets, though that does have some problems. One problem is that it's completely unnormalized (the proportion of the dataset that is made up of different kinds of books changes over time), and another is that you can't really search for movie titles that are also names of anything else, or common English words, without getting other stuff in the results.

But with those caveats it can be fun to play around with: http://books.google.com/ngrams/graph?content=Birth+of+a+Nati...

> The only one I was surprised to see on there was Juno. I've never seen it, is it really that notable?

No. It was zeitgeisty in the way that some films are, and it got a lot of buzz among important demographics, but it's not "important".

I think that Juno is notable mostly as one of the most commercially successful "indie" films. It exposed a lot of mainstream audiences to films with a lower budget.

A suggestion: "Most culturally prominent" works better than "most influential" The only thing Wikipedia is going to tell you is how many other Wikipedia authors cite something. A movie can have terrific influence globally but in an indirect manner, influencing a generation of writers and directors, for instance. Likewise it could have influenced a lot of people, just not people who have anything to do with Wikipedia. Wikipedia is more like a popularity contest. "Birth of a Nation", for instance, had a huge impact all across North America -- but nobody is around anymore to cite it, so it's mostly lost to time.

I see you've changed the title. Cool.

Nice work!

"Birth of a Nation", for instance, had a huge impact all across North America -- but nobody is around anymore to cite it

That's an odd choice of counterexample, since it ranks pretty highly in this list, so apparently is actually cited a lot on Wikipedia. It's the "most influential" pick for 1915 in the year-by-year list (http://blog.argteam.com/coding/a-history-of-film-in-wikipedi...), and is #16 of all time in the all-time list (http://blog.argteam.com/golang/the-top-25-movies-of-all-time...).

You definitely aren't wrong, but it is worth noting that the PageRank is calculated based on all wikipedia articles, and not just those of movies (so influence into other areas will hopefully bubble up). "Birth of a Nation" actually does show up quite prominently for the pre 1930s era.

very few foreign films here, a large number of which have been decisively influential on even the US cinema. so, in the end, it think we understand wikipedia better, rather than understanding film history better.

That's certainly a downside. I've been meaning to run this on non-english Wikipedias to see the similarities and differences. A subject for a future blog post :)

Avatar and The Social Network? Give me a break! That says a lot about... something... I think...

To me it says something about the general state of some things. Also, other stuff... Which is related to the first stuff. Maybe that's less relevant.

Wikipedia (or someone/thing related to Wikipedia) keeps track of page visits / edits per page.

For example: http://stats.grok.se/en/latest/The_Dark_Knight_Rises

Cool! I didn't know about this. The score I compute is actually a PageRank, which takes into account the quality of inbound links rather than just the raw number. I'd be interested to try to see how well this correlates to page visits!

Following the first full-talkie "Lights of New York", the low apparent influence (using this method) of Warner's part-talkie 1928 "Singing Fool" overlooks that it starred Al Jolson, included several songs which became traditional, and made $4 million. Most studios moved to talkies within a year.

I'd guess that analyzing the stockmarket is easier than the endless contingencies driving WP.

Interesting manipulations of data, but a shoddy attempt at analysis. This guy needs to work on constructing meaningful variables for his interpretation. "Important" means nothing. Is he looking at popularity, and if so, what kind? How representative is the sample of Wikipedia's film pages against the total films produced? What factors led to the "influence" metrics?

My post wasn't really aimed at a technical audience. It was more of "look what cool stuff we can do without looping in a human". I've given a slightly more technical introduction in my post on ranking universities: http://blog.argteam.com/coding/university-ranking-wikipedia/

It really is just PageRank. The source code is available on github https://github.com/cosbynator/WikiRank

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact