Hacker News new | comments | show | ask | jobs | submit login

Using logistic regression that way isn't going to do much, since each city has a single nonzero independent variable.

I'd try something like this, stats wise. It is just 10 minutes of effort, but accounts for the different sample sizes. I added one success and failure to each city, i.e. Laplace smoothing. Then I computed a 95% percent confidence interval for each, using binom.test in R (i.e. binom.test(8,9)$conf.int). I sorted by the lower bound.

If the lower bound of one city is above the upper bound of another, then you can say it is a better place, using your metric.

                             x+1     n+2     lb      x1/n2   ub
     Mountain View, CA USA   144     209     0.621   0.689   0.751
     South San Francisco, CA 33      44      0.597   0.750   0.868
     Cambridge, MA USA       103     173     0.518   0.595   0.669
     Chapel Hill, NC USA     8       9       0.518   0.889   0.997
     Waltham, MA USA         61      101     0.502   0.604   0.700
     Cupertino, CA USA       38      60      0.499   0.633   0.754
     Foster City, CA USA     16      22      0.498   0.727   0.893
     San Mateo, CA USA       88      159     0.473   0.553   0.632
     San Jose, CA USA        112     208     0.468   0.538   0.608
     New York, NY USA        399     809     0.458   0.493   0.528
     San Bruno, CA USA       14      20      0.457   0.700   0.881
     Palo Alto, CA USA       128     246     0.456   0.520   0.584
     Itasca, IL USA          8       10      0.444   0.800   0.975
     Westford, MA USA        8       10      0.444   0.800   0.975
     Los Gatos, CA USA       14      21      0.430   0.667   0.854
     Arlington, VA USA       15      23      0.427   0.652   0.836
     Aliso Viejo, CA USA     16      25      0.425   0.640   0.820
     Branford, CT USA        6       7       0.421   0.857   0.996
     Milpitas, CA USA        22      37      0.421   0.595   0.752
     Alameda, CA USA         10      14      0.419   0.714   0.916
     Redwood City, CA USA    64      126     0.417   0.508   0.598
     Austin, TX USA          115     242     0.411   0.475   0.540
     Berlin, 16 DEU          16      27      0.388   0.593   0.776
     Chelmsford, MA USA      9       13      0.386   0.692   0.909
     Portland, OR USA        36      71      0.386   0.507   0.628
     Tel Aviv, 5 ISR         38      76      0.383   0.500   0.617
     Pasadena, CA USA        21      38      0.383   0.553   0.714
     Fremont, CA USA         40      81      0.381   0.494   0.607
     Boulder, CO USA         40      85      0.361   0.471   0.582
     Morrisville, NC USA     20      38      0.358   0.526   0.690
     Marlborough, MA USA     16      29      0.357   0.552   0.736
     El Segundo, CA USA      10      16      0.354   0.625   0.848
     Sterling, VA USA        9       14      0.351   0.643   0.872
     Richardson, TX USA      18      34      0.351   0.529   0.702
     Bethesda, MD USA        18      34      0.351   0.529   0.702
     Burlington, ON CAN      6       8       0.349   0.750   0.968
     Oak Brook, IL USA       6       8       0.349   0.750   0.968
     Solana Beach, CA USA    6       8       0.349   0.750   0.968
     Venice, CA USA          8       12      0.349   0.667   0.901
     Belmont, CA USA         8       12      0.349   0.667   0.901
     Vancouver, BC CAN       45      101     0.347   0.446   0.548
     Atlanta, GA USA         58      140     0.332   0.414   0.501
     Montreal, QC CAN        29      64      0.328   0.453   0.583
     London, H9 GBR          131     358     0.316   0.366   0.418
     Kfar Saba, 2 ISR        8       13      0.316   0.615   0.861
     Ottawa, ON CAN          24      53      0.316   0.453   0.596
     Irvine, CA USA          44      108     0.314   0.407   0.506
     Mclean, VA USA          15      30      0.313   0.500   0.687
     Minneapolis, MN USA     23      51      0.311   0.451   0.597
     Beverly Hills, CA USA   10      18      0.308   0.556   0.785
     Toronto, ON CAN         60      157     0.306   0.382   0.463
     Surry Hills, 2 AUS      6       9       0.299   0.667   0.925
     Kitchener, ON CAN       6       9       0.299   0.667   0.925
     Doylestown, PA USA      5       7       0.290   0.714   0.963
     Ames, IA USA            5       7       0.290   0.714   0.963
     Salt Lake City, UT USA  27      66      0.290   0.409   0.537
     Irving, TX USA          10      19      0.289   0.526   0.756
     Gaithersburg, MD USA    10      19      0.289   0.526   0.756
     Tokyo, 40 JPN           24      58      0.286   0.414   0.551
     Gent, 8 BEL             4       5       0.284   0.800   0.995
     Fuzhou Shi, 3 CHN       4       5       0.284   0.800   0.995
     Lake Forest, IL USA     4       5       0.284   0.800   0.995
     Sunrise, FL USA         4       5       0.284   0.800   0.995
     Fredericton, NS CAN     4       5       0.284   0.800   0.995
     Princeton, NJ USA       11      22      0.282   0.500   0.718
     Baltimore, MD USA       15      33      0.281   0.455   0.636
     Paris, A8 FRA           57      161     0.280   0.354   0.433
     Wilmington, DE USA      9       17      0.278   0.529   0.770
     San Antonio, TX USA     14      31      0.273   0.452   0.640
     Beijing, 22 CHN         46      130     0.272   0.354   0.442
     Oakland, CA USA         16      37      0.271   0.432   0.605
     Hamburg, 4 DEU          12      26      0.266   0.462   0.666
     Westminster, CO USA     8       15      0.266   0.533   0.787
     Philadelphia, PA USA    20      50      0.264   0.400   0.548
     Houston, TX USA         36      101     0.264   0.356   0.458
     Cambridge, C3 GBR       18      44      0.263   0.409   0.568
     Ann Arbor, MI USA       14      33      0.255   0.424   0.608



> I'd try something like this, stats wise. It is just 10 minutes of effort, but accounts for the different sample sizes. I added one success and failure to each city, i.e. Laplace smoothing. Then I computed a 95% percent confidence interval for each, using binom.test in R (i.e. binom.test(8,9)$conf.int). I sorted by the lower bound.

Could you enlighten the statistically-impaired on what this has accomplished, compared to the raw data?


This is a called a binomial process. We want to estimate the true proportion of exits for a given city. A city has x exits out of n trials, so one estimate of this proportion is just x/n. However, if you have a bunch of cities like this list, you're going to have cities that by random chance end up with a fraction close to 1. That doesn't mean that startups there are guaranteed to succeed. It means if you flip a coin 5 time, and replicate it 1000 times, you're doing to have some runs of 5 heads, and some runs of 5 tails. If you kept going, you would find the proportion approaching 1/2 (for a coin) for all cases.

So the problem is how you compare a city with 5/6 exits like Branford, CT USA with 143/208 like Mountain View. Is Branford that much better because 5/6 > 143/208 ? Mostly all you know is that the error in your estimate is much larger for Branford than for Mountain View, because your value of n is 6 vs 208. You can't say with statistical confidence that Branford is better.

So one trick to punish the little n locations is to do some smoothing. Laplace smoothing is to add 1 for all outcomes, so 1 success to x and one failure, meaning we add 2 to n. That also means that nothing gets exactly to 0.0 or 1.0. The odds in Saint Petersburg Russia aren't really 0.0 because the were 0/11. There is some chance you could succeed, so it gives you a better estimate of "unseen events".

The next thing you want to do is look at confidence intervals, rather than our point estimate of x/n or even (x+1)/(n+2). There are a number of formulas you can use, I used one built into R, a statistical modelling language. This gives you a lower and upper bound on your true estimate of the proportion. If the bounds is exact, then 95% of the time the interval will contain this true, unknown proportion.

The bounds on my smoothed counts are:

                             x+1     n+2     lb      x1/n2   ub
     Mountain View, CA USA   144     209     0.621   0.689   0.751
     Branford, CT USA        6       7       0.421   0.857   0.996
     Los Angeles, CA USA     56      180     0.244   0.311   0.384
So the true estimate of Mountain View is somewhere between 0.621 to 0.751, while Branford CT is between 0.421 and 0.996. Since these estimates overlap, we can't really say one is better than the other. Also consider LA, which has range of 0.244 to 0.384. Since 0.384 < 0.421, we could say that LA has a worse exit ratio than either Brandford or Mountain View, with 95% confidence.

To sort, it is often good to be conservative and use the lower bound. I used a 95%, which is good for saying Branford is better than LA, but might be a bit large for sorting. You could use a 90% or even 80% interval for that, if desired.

It is really crucial to take into account what you don't know when comparing fractions based on different values of n.

Hope this helps...


(10 days later...)

It does. Thank you.


cont ....

     Washington, DC USA      20      52      0.253   0.385   0.530
     Orem, UT USA            7       13      0.251   0.538   0.808
     Dallas, TX USA          30      86      0.249   0.349   0.459
     Mississauga, ON CAN     8       16      0.247   0.500   0.753
     Kennesaw, GA USA        5       8       0.245   0.625   0.915
     Arlington Heights, IL   5       8       0.245   0.625   0.915
     West Hollywood, CA USA  9       19      0.244   0.474   0.711
     Los Angeles, CA USA     56      180     0.244   0.311   0.384
     Helsinki, 13 FIN        13      32      0.237   0.406   0.594
     Jacksonville, FL USA    10      23      0.232   0.435   0.655
     La Jolla, CA USA        10      23      0.232   0.435   0.655
     Charlotte, NC USA       14      36      0.231   0.389   0.565
     Oslo, 12 NOR            9       20      0.231   0.450   0.685
     Madrid, 29 ESP          16      43      0.230   0.372   0.533
     Plano, TX USA           14      37      0.225   0.378   0.552
     Zug, 24 CHE             4       6       0.223   0.667   0.957
     Pittsburgh, PA USA      27      85      0.221   0.318   0.428
     Bangalore, 19 IND       20      61      0.213   0.328   0.460
     Petaluma, CA USA        7       15      0.213   0.467   0.734
     Tampa, FL USA           14      39      0.212   0.359   0.528
     Delft, 11 NLD           5       9       0.212   0.556   0.863
     Costa Mesa, CA USA      5       9       0.212   0.556   0.863
     Newtown, PA USA         5       9       0.212   0.556   0.863
     Boxborough, MA USA      5       9       0.212   0.556   0.863
     Munchen, 2 DEU          6       12      0.211   0.500   0.789
     Copenhagen, 17 DNK      15      43      0.210   0.349   0.509
     Brooklyn, NY USA        15      43      0.210   0.349   0.509
     Newton, MA USA          9       22      0.207   0.409   0.636
     Stockholm, 26 SWE       17      52      0.203   0.327   0.471
     Berkeley, CA USA        10      26      0.202   0.385   0.594
     Buffalo, NY USA         7       16      0.198   0.438   0.701
     St Louis, MO USA        12      34      0.197   0.353   0.535
     Dublin, 7 IRL           22      74      0.197   0.297   0.415
     Odense, 21 DNK          3       4       0.194   0.750   0.994
     West Des Moines, IA USA 3       4       0.194   0.750   0.994
     Notting Hill, 7 AUS     3       4       0.194   0.750   0.994
     Mi Wuk Village, CA USA  3       4       0.194   0.750   0.994
     Seoul, 11 KOR           11      31      0.192   0.355   0.546
     Minnetonka, MN USA      6       13      0.192   0.462   0.749
     Abingdon, K2 GBR        6       13      0.192   0.462   0.749
     Zurich, 25 CHE          8       20      0.191   0.400   0.639
     Charleston, SC USA      5       10      0.187   0.500   0.813
     Roncade, 20 ITA         4       7       0.184   0.571   0.901
     Burnaby, BC CAN         8       21      0.181   0.381   0.616
     Annapolis, MD USA       6       14      0.177   0.429   0.711
     Gurgaon, 10 IND         7       18      0.173   0.389   0.643
     Raleigh, NC USA         13      44      0.168   0.295   0.452
     Halifax, NS CAN         5       11      0.167   0.455   0.766
     Orlando, FL USA         9       27      0.165   0.333   0.540
     Shenzhen, 30 CHN        11      36      0.163   0.306   0.481
     Dubai, 3 ARE            6       15      0.163   0.400   0.677
     New Haven, CT USA       6       15      0.163   0.400   0.677
     Indianapolis, IN USA    10      32      0.161   0.313   0.500
     Stuttgart, 1 DEU        5       12      0.152   0.417   0.723
     Addison, TX USA         5       12      0.152   0.417   0.723
     Blackrock, 7 IRL        3       5       0.147   0.600   0.947
     San Marcos, TX USA      3       5       0.147   0.600   0.947
     Pune, 16 IND            3       5       0.147   0.600   0.947
     Vienna, 9 AUT           8       26      0.143   0.308   0.518
     Newport Beach, CA USA   7       22      0.139   0.318   0.549
     Schaumburg, IL USA      4       9       0.137   0.444   0.788
     Istanbul, 34 TUR        5       14      0.128   0.357   0.649
     Wilmington, MA USA      5       14      0.128   0.357   0.649
     Sao Paulo, 2 BRA        10      40      0.127   0.250   0.412
     Prague, 52 CZE          4       10      0.122   0.400   0.738
     Espoo, 13 FIN           4       10      0.122   0.400   0.738
     Plymouth, MN USA        5       15      0.118   0.333   0.616
     Longmont, CO USA        5       15      0.118   0.333   0.616
     Cleveland, OH USA       10      43      0.118   0.233   0.386
     Rochester, NY USA       6       21      0.113   0.286   0.522
     Rio De Janeiro, 21 BRA  5       16      0.110   0.313   0.587
     Lucerne Valley, CA USA  5       17      0.103   0.294   0.560
     Utrecht, 9 NLD          3       7       0.099   0.429   0.816
     Allentown, PA USA       3       8       0.085   0.375   0.755
     Melbourne, 7 AUS        5       22      0.078   0.227   0.454
     Charlottesville, VA USA 4       15      0.078   0.267   0.551
     Clearwater, FL USA      3       9       0.075   0.333   0.701
     Columbus, OH USA        6       32      0.072   0.188   0.364
     Santa Ana, CA USA       4       17      0.068   0.235   0.499
     Edison, NJ USA          3       11      0.060   0.273   0.610
     Moscow, 48 RUS          14      137     0.057   0.102   0.166
     Netanya, 2 ISR          3       12      0.055   0.250   0.572
     Lod, 2 ISR              2       6       0.043   0.333   0.777
     Lawrenceville, GA USA   2       6       0.043   0.333   0.777
     Champaign, IL USA       2       7       0.037   0.286   0.710
     Blacksburg, VA USA      2       7       0.037   0.286   0.710
     Golden, CO USA          2       7       0.037   0.286   0.710
     Guangdong, 5 CHN        2       7       0.037   0.286   0.710
     Quebec, QC CAN          3       18      0.036   0.167   0.414
     Memphis, TN USA         3       18      0.036   0.167   0.414
     Liverpool, H8 GBR       2       8       0.032   0.250   0.651
     Burbank, CA USA         2       8       0.032   0.250   0.651
     Galway, 10 IRL          2       9       0.028   0.222   0.600
     Kista, 26 SWE           2       10      0.025   0.200   0.556
     Manchester, I2 GBR      2       10      0.025   0.200   0.556
     Orsay, A8 FRA           1       4       0.006   0.250   0.806
     Cherry Hill, NJ USA     1       4       0.006   0.250   0.806
     Mountain, WI USA        1       4       0.006   0.250   0.806
     Pittsburg, CA USA       1       4       0.006   0.250   0.806
     Eatontown, NJ USA       1       5       0.005   0.200   0.716
     Superior, WI USA        1       5       0.005   0.200   0.716
     Laguna Beach, CA USA    1       5       0.005   0.200   0.716
     Columbia, SC USA        1       5       0.005   0.200   0.716
     Gilbert, AZ USA         1       5       0.005   0.200   0.716
     City Of Industry, CA    1       6       0.004   0.167   0.641
     Tacoma, WA USA          1       6       0.004   0.167   0.641
     Toledo, OH USA          1       6       0.004   0.167   0.641
     Laval, QC CAN           1       6       0.004   0.167   0.641
     Napa, CA USA            1       6       0.004   0.167   0.641
     Owings Mills, MD USA    1       7       0.004   0.143   0.579
     Cedar Park, TX USA      1       7       0.004   0.143   0.579
     New Orleans, LA USA     1       7       0.004   0.143   0.579
     Morgan Hill, CA USA     1       7       0.004   0.143   0.579
     Newark, NJ USA          1       8       0.003   0.125   0.527
     Sausalito, CA USA       1       8       0.003   0.125   0.527
     Livermore, CA USA       1       9       0.003   0.111   0.482
     Centennial, CO USA      1       10      0.003   0.100   0.445
     Tallinn, 1 EST          1       10      0.003   0.100   0.445
     Little Rock, AR USA     1       11      0.002   0.091   0.413
     Jakarta, 4 IDN          1       11      0.002   0.091   0.413
     Porto Alegre, 23 BRA    1       11      0.002   0.091   0.413
     Saint Petersburg,  RUS  1       12      0.002   0.083   0.385


I'm very pleased to see Los Angeles beat Mi Wuk Village in this list. The LA vs. Mi Wuk startup rivalry is legendary and, I must admit, I was quite embarrassed about LA's performance when I saw the first list.


Thanks for this, your methodology makes a lot more statistical sense to me.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: