I love the use of different-sized circles, but I feel like you're comparing apples to oranges in a way. I think the circles should all be relative to one master size (#1) only their own category; rather than comparing Ron Conway to Google his circle could be of equal size. This way when searching for entities their circles would make more sense because the user is aware of the consistent max circle size. By comparing Conway to Google you're giving up a wider scale you could be using for the Peoples' circles.
Still though, cool project. I found the 5-person startup I'm interning at this summer :) Makes for a great crash-course on who matters if someone were trying to study up on the startup scene.
We tried both options and liked using one scale for all categories. The idea is that you are comparing influence, or something like it, so that can be visually compared across categories. You do have the downside of having very small dots for the people after only a few pages, but since there are > 100,000 people, most are going to have the minimum sized dot anyway.
Dave McClure at #3, Paul Graham at #943; whereas 500 Startups is at #41, and Y Combinator is at #15?
Why is 500 Startups classified as a financial firm, whereas Y Combinator is classified as a company?
Also, I'm surprised that Andreessen Horowitz is ranked #67 of financial firms (given that Marc Andreessen is ranked #14), and Elon Musk is #119 for people. I would have thought they'd both be ranked much higher.
In the data source, TechCrunch's Crunchbase, the influence of Paul Graham, Jessica Livingston, et al. is captured in YC the company, and Marc Andreessen is split between his personal identity and AH. So you get a lot of odd things like that.
As for why YC got classified a company, who knows? It's accident of history. They should change it.
TechCrunch as a single data source. Are there any other sources used?
Sooner the better, TechCrunch data is iffy ~ http://www.flickr.com/photos/bootload/2913315731/ though useful for a start point. Did you do a select on the companies to check for multiple listings?
The data from Crunchbase is very, very dirty, but I managed to clean it up a lot. Feel free to fork if you want pre-built NetworkX graphs.
I miss attending events when a prior startup was based there.
The 217,000 most important companies, financial firms, and people...
"...in the startup world..."
In fact, I am surprised to see a lot of non-startup companies like Nokia listed quite highly.
Very nice work!
But Yahoo ahead of Facebook, Amazon and Apple?
Myspace ahead of Apple?
Paul Graham is #943 among people?
Cool project, even with the eyebrow-raising results.
Good luck, useful.
One request, in the UI, where you have checkboxes, can you add 'exclusive' select. So for example, I want to see ONLY 'acquisitions' made; right now, we have to deselect everything else manually.
But my last start-up, which I shutdown over a year ago (HearWhere, 19,883) is ranked as more influential than companies that are significantly larger traffic that are still operating (example, AllRecipes, 23,087).
Nice to think I'm that influential, but I can assure you, I'm not (yet ;))
Any ranking will put importance on certain factors which are chosen from all the possible factors by the designer of the ranking. You have chosen to rank based (I presume) on incoming links on CrunchBase. However, this is intrinsically no more impartial than a ranking based on yearly revenue, or number of employees, or size of their offices, number of mentions in the New York Times, or any other of an unbounded number of metrics.
I would argue that PageRank, while it may have done a good job of ranking websites for the purposes of search, is a pretty poor choice here. It's highly susceptible to reporting bias, where some companies will be better-represented on CB. Empirically, you also get weird artifacts, like the ranking of MySpace ahead of Apple.
Choosing a good metric requires a theoretical explanation of why that metric is important and why it helps answer your question (which you don't state, but I suspect is something like, "which are the most important startups", whatever that might mean). Just choosing something doesn't necessarily tell you anything useful.
[If you're interested in high stakes debates about rankings, look into the hoopla that comes about every year when US News & World Report releases their college rankings. These numbers can have huge effects on college's prestige and, lower down the food chain, their bottom lines.]
Perhaps the better word, instead of 'impartial', would be 'disinterested'.
Try clicking on MySpace's blue dot and see what is connected to it.