
A History of Film in Wikipedia - jvrossb
http://blog.argteam.com/coding/a-history-of-film-in-wikipedia/
======
foobarbazqux
It's important to remember that these are the films that have been influential
on English-speaking Wikipedia editors. 75% of editors are under 30, 90% are
male. 76% of edits are to the English Wikipedia. I couldn't find any numbers
but I believe there's a similar bias towards white Americans, simply based on
the demographics of the US, UK, Australia, NZ, and Canada.

Influential films for white male Americans under 30 that also choose to edit
Wikipedia? Sure.

To be clear, I thought this was an interesting technique and visualization. I
like that it's influence by year instead of the top 100 influential films of
all time, with Citizen Kane at the top.

~~~
derleth
True. For what it's worth, Wikipedia has project pages explicitly addressing
this:

[http://en.wikipedia.org/wiki/Wikipedia:Systemic_bias](http://en.wikipedia.org/wiki/Wikipedia:Systemic_bias)

[http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Counterin...](http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Countering_systemic_bias)

Even at best, though, an Internet project is inevitably going to be biased
towards the POV of whoever uses the Internet. The solution is to expand
Internet access, which quickly reduces to the problem of solving all of the
social problems nobody's been able to solve yet. Piece of cake.

~~~
AlexanderDhoore
Just discovered (under Systemic bias) that the Dutch wikipedia is the second
biggest. That's pretty awesome of us Dutchies! (I also read ones that the
Dutch are 4th or something on torrent uploads. Brilliant.)

------
RKoutnik
Well done, in both data crunching and visualizing. I'd love to revisit this in
15 years and see what modern films held up. I doubt TDKR will last (even for
being a decent movie). Avatar and TDK will probably continue to be notable
films, as both were historic points in film.

The only one I was surprised to see on there was Juno. I've never seen it, is
it really that notable?

~~~
lolcraft
Avatar and TDK historic points in film. AKA Dancing with Smurfs, and The Movie
With More Plot Holes Than The Titanic. Hum, I beg to disagree.

Overall, I got a good laugh from the list. Seriously, this list should be
called "things white US kids (of my age, granted) like". Rather than a history
of cinema, it belongs more in the "comedy generated by computer" genre.
Exhibit one: _City Lights_ isn't there. Exhibit two: Modern Times isn't there.
No Le Voyage dans la Lune, no Man With Camera, no Great Train Robbery,
Rashomon isn't cool enough to make the list, Vertigo is too weird, and
Battleship Potemkin is, like, too commie, you know. The final straws would be
Independence Day and It's a Wonderful Life making the list.

I mean, On the Waterfront looks like it's very good (I haven't seen it), but,
but, it was released in the same year than _motherfucking Seven Samurai_!

~~~
pessimizer
Seven Samurai is not an unambiguously better movie than On the Waterfront. I
hate Elia Kazan as much as the next guy for what he did, but he was a historic
talent directing historic talent, making snitching on the working man to the
bosses seem positively heroic. Foreign things aren't always better.

------
DanielBMarkham
A suggestion: "Most culturally prominent" works better than "most influential"
The only thing Wikipedia is going to tell you is how many other Wikipedia
authors cite something. A movie can have terrific influence globally but in an
indirect manner, influencing a generation of writers and directors, for
instance. Likewise it could have influenced a lot of people, just not people
who have anything to do with Wikipedia. Wikipedia is more like a popularity
contest. "Birth of a Nation", for instance, had a huge impact all across North
America -- but nobody is around anymore to cite it, so it's mostly lost to
time.

I see you've changed the title. Cool.

Nice work!

~~~
_delirium
_" Birth of a Nation", for instance, had a huge impact all across North
America -- but nobody is around anymore to cite it_

That's an odd choice of counterexample, since it ranks pretty highly in this
list, so apparently is actually cited a lot on Wikipedia. It's the "most
influential" pick for 1915 in the year-by-year list
([http://blog.argteam.com/coding/a-history-of-film-in-
wikipedi...](http://blog.argteam.com/coding/a-history-of-film-in-wikipedia/)),
and is #16 of all time in the all-time list
([http://blog.argteam.com/golang/the-top-25-movies-of-all-
time...](http://blog.argteam.com/golang/the-top-25-movies-of-all-time-chosen-
by-an-algorithm/)).

------
jeremyswank
very few foreign films here, a large number of which have been decisively
influential on even the US cinema. so, in the end, it think we understand
wikipedia better, rather than understanding film history better.

~~~
cosbynator
That's certainly a downside. I've been meaning to run this on non-english
Wikipedias to see the similarities and differences. A subject for a future
blog post :)

------
crististm
Avatar and The Social Network? Give me a break! That says a lot about...
something... I think...

~~~
AlexanderDhoore
To me it says something about the general state of some things. Also, other
stuff... Which is related to the first stuff. Maybe that's less relevant.

------
swang
Wikipedia (or someone/thing related to Wikipedia) keeps track of page visits /
edits per page.

For example:
[http://stats.grok.se/en/latest/The_Dark_Knight_Rises](http://stats.grok.se/en/latest/The_Dark_Knight_Rises)

~~~
cosbynator
Cool! I didn't know about this. The score I compute is actually a PageRank,
which takes into account the quality of inbound links rather than just the raw
number. I'd be interested to try to see how well this correlates to page
visits!

------
teeja
Following the first full-talkie "Lights of New York", the low apparent
influence (using this method) of Warner's part-talkie 1928 "Singing Fool"
overlooks that it starred Al Jolson, included several songs which became
traditional, and made $4 million. Most studios moved to talkies within a year.

I'd guess that analyzing the stockmarket is easier than the endless
contingencies driving WP.

------
triplesec
Interesting manipulations of data, but a shoddy attempt at analysis. This guy
needs to work on constructing meaningful variables for his interpretation.
"Important" means nothing. Is he looking at popularity, and if so, what kind?
How representative is the sample of Wikipedia's film pages against the total
films produced? What factors led to the "influence" metrics?

~~~
cosbynator
My post wasn't really aimed at a technical audience. It was more of "look what
cool stuff we can do without looping in a human". I've given a slightly more
technical introduction in my post on ranking universities:
[http://blog.argteam.com/coding/university-ranking-
wikipedia/](http://blog.argteam.com/coding/university-ranking-wikipedia/)

It really is just PageRank. The source code is available on github
[https://github.com/cosbynator/WikiRank](https://github.com/cosbynator/WikiRank)

