Using logistic regression that way isn't going to do much, since each city has a single nonzero independent variable.
I'd try something like this, stats wise. It is just 10 minutes of effort, but accounts for the different sample sizes. I added one success and failure to each city, i.e. Laplace smoothing. Then I computed a 95% percent confidence interval for each, using binom.test in R (i.e. binom.test(8,9)$conf.int). I sorted by the lower bound.
If the lower bound of one city is above the upper bound of another, then you can say it is a better place, using your metric.
x+1 n+2 lb x1/n2 ub
Mountain View, CA USA 144 209 0.621 0.689 0.751
South San Francisco, CA 33 44 0.597 0.750 0.868
Cambridge, MA USA 103 173 0.518 0.595 0.669
Chapel Hill, NC USA 8 9 0.518 0.889 0.997
Waltham, MA USA 61 101 0.502 0.604 0.700
Cupertino, CA USA 38 60 0.499 0.633 0.754
Foster City, CA USA 16 22 0.498 0.727 0.893
San Mateo, CA USA 88 159 0.473 0.553 0.632
San Jose, CA USA 112 208 0.468 0.538 0.608
New York, NY USA 399 809 0.458 0.493 0.528
San Bruno, CA USA 14 20 0.457 0.700 0.881
Palo Alto, CA USA 128 246 0.456 0.520 0.584
Itasca, IL USA 8 10 0.444 0.800 0.975
Westford, MA USA 8 10 0.444 0.800 0.975
Los Gatos, CA USA 14 21 0.430 0.667 0.854
Arlington, VA USA 15 23 0.427 0.652 0.836
Aliso Viejo, CA USA 16 25 0.425 0.640 0.820
Branford, CT USA 6 7 0.421 0.857 0.996
Milpitas, CA USA 22 37 0.421 0.595 0.752
Alameda, CA USA 10 14 0.419 0.714 0.916
Redwood City, CA USA 64 126 0.417 0.508 0.598
Austin, TX USA 115 242 0.411 0.475 0.540
Berlin, 16 DEU 16 27 0.388 0.593 0.776
Chelmsford, MA USA 9 13 0.386 0.692 0.909
Portland, OR USA 36 71 0.386 0.507 0.628
Tel Aviv, 5 ISR 38 76 0.383 0.500 0.617
Pasadena, CA USA 21 38 0.383 0.553 0.714
Fremont, CA USA 40 81 0.381 0.494 0.607
Boulder, CO USA 40 85 0.361 0.471 0.582
Morrisville, NC USA 20 38 0.358 0.526 0.690
Marlborough, MA USA 16 29 0.357 0.552 0.736
El Segundo, CA USA 10 16 0.354 0.625 0.848
Sterling, VA USA 9 14 0.351 0.643 0.872
Richardson, TX USA 18 34 0.351 0.529 0.702
Bethesda, MD USA 18 34 0.351 0.529 0.702
Burlington, ON CAN 6 8 0.349 0.750 0.968
Oak Brook, IL USA 6 8 0.349 0.750 0.968
Solana Beach, CA USA 6 8 0.349 0.750 0.968
Venice, CA USA 8 12 0.349 0.667 0.901
Belmont, CA USA 8 12 0.349 0.667 0.901
Vancouver, BC CAN 45 101 0.347 0.446 0.548
Atlanta, GA USA 58 140 0.332 0.414 0.501
Montreal, QC CAN 29 64 0.328 0.453 0.583
London, H9 GBR 131 358 0.316 0.366 0.418
Kfar Saba, 2 ISR 8 13 0.316 0.615 0.861
Ottawa, ON CAN 24 53 0.316 0.453 0.596
Irvine, CA USA 44 108 0.314 0.407 0.506
Mclean, VA USA 15 30 0.313 0.500 0.687
Minneapolis, MN USA 23 51 0.311 0.451 0.597
Beverly Hills, CA USA 10 18 0.308 0.556 0.785
Toronto, ON CAN 60 157 0.306 0.382 0.463
Surry Hills, 2 AUS 6 9 0.299 0.667 0.925
Kitchener, ON CAN 6 9 0.299 0.667 0.925
Doylestown, PA USA 5 7 0.290 0.714 0.963
Ames, IA USA 5 7 0.290 0.714 0.963
Salt Lake City, UT USA 27 66 0.290 0.409 0.537
Irving, TX USA 10 19 0.289 0.526 0.756
Gaithersburg, MD USA 10 19 0.289 0.526 0.756
Tokyo, 40 JPN 24 58 0.286 0.414 0.551
Gent, 8 BEL 4 5 0.284 0.800 0.995
Fuzhou Shi, 3 CHN 4 5 0.284 0.800 0.995
Lake Forest, IL USA 4 5 0.284 0.800 0.995
Sunrise, FL USA 4 5 0.284 0.800 0.995
Fredericton, NS CAN 4 5 0.284 0.800 0.995
Princeton, NJ USA 11 22 0.282 0.500 0.718
Baltimore, MD USA 15 33 0.281 0.455 0.636
Paris, A8 FRA 57 161 0.280 0.354 0.433
Wilmington, DE USA 9 17 0.278 0.529 0.770
San Antonio, TX USA 14 31 0.273 0.452 0.640
Beijing, 22 CHN 46 130 0.272 0.354 0.442
Oakland, CA USA 16 37 0.271 0.432 0.605
Hamburg, 4 DEU 12 26 0.266 0.462 0.666
Westminster, CO USA 8 15 0.266 0.533 0.787
Philadelphia, PA USA 20 50 0.264 0.400 0.548
Houston, TX USA 36 101 0.264 0.356 0.458
Cambridge, C3 GBR 18 44 0.263 0.409 0.568
Ann Arbor, MI USA 14 33 0.255 0.424 0.608
> I'd try something like this, stats wise. It is just 10 minutes of effort, but accounts for the different sample sizes. I added one success and failure to each city, i.e. Laplace smoothing. Then I computed a 95% percent confidence interval for each, using binom.test in R (i.e. binom.test(8,9)$conf.int). I sorted by the lower bound.
Could you enlighten the statistically-impaired on what this has accomplished, compared to the raw data?
This is a called a binomial process. We want to estimate the true proportion of exits for a given city. A city has x exits out of n trials, so one estimate of this proportion is just x/n. However, if you have a bunch of cities like this list, you're going to have cities that by random chance end up with a fraction close to 1. That doesn't mean that startups there are guaranteed to succeed. It means if you flip a coin 5 time, and replicate it 1000 times, you're doing to have some runs of 5 heads, and some runs of 5 tails. If you kept going, you would find the proportion approaching 1/2 (for a coin) for all cases.
So the problem is how you compare a city with 5/6 exits like Branford, CT USA with 143/208 like Mountain View. Is Branford that much better because 5/6 > 143/208 ? Mostly all you know is that the error in your estimate is much larger for Branford than for Mountain View, because your value of n is 6 vs 208. You can't say with statistical confidence that Branford is better.
So one trick to punish the little n locations is to do some smoothing. Laplace smoothing is to add 1 for all outcomes, so 1 success to x and one failure, meaning we add 2 to n. That also means that nothing gets exactly to 0.0 or 1.0. The odds in Saint Petersburg Russia aren't really 0.0 because the were 0/11. There is some chance you could succeed, so it gives you a better estimate of "unseen events".
The next thing you want to do is look at confidence intervals, rather than our point estimate of x/n or even (x+1)/(n+2). There are a number of formulas you can use, I used one built into R, a statistical modelling language. This gives you a lower and upper bound on your true estimate of the proportion. If the bounds is exact, then 95% of the time the interval will contain this true, unknown proportion.
The bounds on my smoothed counts are:
x+1 n+2 lb x1/n2 ub
Mountain View, CA USA 144 209 0.621 0.689 0.751
Branford, CT USA 6 7 0.421 0.857 0.996
Los Angeles, CA USA 56 180 0.244 0.311 0.384
So the true estimate of Mountain View is somewhere between 0.621 to 0.751, while Branford CT is between 0.421 and 0.996. Since these estimates overlap, we can't really say one is better than the other. Also consider LA, which has range of 0.244 to 0.384. Since 0.384 < 0.421, we could say that LA has a worse exit ratio than either Brandford or Mountain View, with 95% confidence.
To sort, it is often good to be conservative and use the lower bound. I used a 95%, which is good for saying Branford is better than LA, but might be a bit large for sorting. You could use a 90% or even 80% interval for that, if desired.
It is really crucial to take into account what you don't know when comparing fractions based on different values of n.
I'm very pleased to see Los Angeles beat Mi Wuk Village in this list. The LA vs. Mi Wuk startup rivalry is legendary and, I must admit, I was quite embarrassed about LA's performance when I saw the first list.
I'd try something like this, stats wise. It is just 10 minutes of effort, but accounts for the different sample sizes. I added one success and failure to each city, i.e. Laplace smoothing. Then I computed a 95% percent confidence interval for each, using binom.test in R (i.e. binom.test(8,9)$conf.int). I sorted by the lower bound.
If the lower bound of one city is above the upper bound of another, then you can say it is a better place, using your metric.