Hacker News new | comments | show | ask | jobs | submit login
Show HN: Use Meetup.com Data to Find Your Ideal City (cole-maclean.github.io)
104 points by ponderingHplus 203 days ago | hide | past | web | 34 comments | favorite

This is cool and a nice idea / UI, but it seems to bubble up populations centers. Needs to normalize for population somehow. (https://xkcd.com/1138/)

Maybe there should be a widget that biases for less dense areas or does it proportionally, but the results aren't necessarily wrong as-is. A 2M+ population city with 20 <special interest group> meetups on a given weekend doesn't have more proportional interest in the topic than a 200K+ city with 2 <special interest group> meetups, but it still has 10x the total amount of activities for a resident to go to. I guess it depends on if the person is trying to figure out the absolute amount of activities available to them or the extent to which that activity dominates the culture of the area. I'd personally want to use the tool to figure out the absolute value.

One solution would be to stratify the results by population density

Yup that's definitely true, population centers rank highest for most topic lists (especially New York where meetup.com was founded)

The original use-case I envisioned for something like this is someone deciding which cities they might like to move to by choosing cities that have a lot of people with shared interests. So should the metric for that be total people that are part of a topic, or the % of total city population engaged in that topic? I've currently chosen the former, but interested in what others might think.

Thanks for the feedback,


I agree with erex78. The former measurement doesn't tell me much; I already know what cities have the most people.

The latter measurement better indicates the probability of a chance encounter with someone who has your shared interests.

Yeah that's a good point. I like scribu's point that it can be somewhere in between. I'll have to think on that a little.

Usually (taken from erf/error based on sample size), something grows proportional to the square root of the sample size. If using raw sample sizes, you can apply a square root to that sample size before using it (meters its impact).

> So should the metric for that be total people that are part of a topic, or the % of total city population engaged in that topic?

Ideally, both. You want to have a community with at least N people, but cities with a high % should rank higher.

Could also account for the number of total meetup.com members in that place.

Hmm interesting, I'll give that some thought. Thanks!

You could have a toggle where you divide by the overall population of each city. So basically let the user switch between percentage and absolute.

Another ranking to consider is the h-index metric for academic papers (https://en.wikipedia.org/wiki/H-index), where you use the number of members of each meetup group in place of citations. So a city with a score h has h meetups each of which has at least h members.

I think meetup.com is a fairly poor data source for these kind of questions - even if the data was normalized. Interested in Coffee? Seattle is better then NY, but is way less likely to have meetups dedicated to it since they are just baked into the culture - so theres a higher probability that the unrelated meetup you will be hosted at a local coffee shop.

Agreed - the results don't make much sense. And some topics don't even show up. "Startup" or "Startups" doesn't get any results, even tho there are many such Meetups. The results for Entrepreneurship are: Top 10 Cities1. Pinellas Park,FL,US2. Gainesville,FL,US3. Greenwich,CT,US4. Piscataway,NJ,US5. Vadodara,IN6. Bern,CH7. Liverpool,28,GB8. San Marcos,CA,US9. Chapel Hill,NC,US10. El Paso,TX,US. Which might be relevant in terms of Meetups, but no way in heck I'm moving to ANY of those cities because of their entrepreneur communities.

This is probably one of things where you only count positive results - presence of meetups on your topic is probably a good thing, but the lack of them doesn't necessary mean anything.

Well my point is that positive, doesn't actually mean positive - there could be more meetups about coffee because there are less good coffee places to consume it. To illustrate this point: I would intuitively expect there to be more pro life meetups in a liberal area and more pro choice in a conservative one - since the norm is baked into society and people don't feel the need to meet up about what everyone agrees to. A much better approach would be to use existence of business that support said aspect of a city.

Very cool. I haven't looked at what Meetup has available for an API but I've rarely ever seen its data used as a way to find out about a city, even though it's a great way to (manually) look up interests. You'd miss out on things that are on Eventbrite (but no reason why you couldn't combine their data...), but otherwise, it seems like a solid way to at least find hotspots for interests, particularly regional cities (i.e. not New York/SF/Chicago) that have, for whatever reason, an especially strong group.

I published a quick writeup to accompany the visualization here: http://cole-maclean.github.io/blog/Meetup.com%20City%20Finde...

I feel like this should be the link instead and you can have a fairly prominent link on the post pointing to the app

I'm curious what the interest would have been had I linked to the article over the graphic. Unfortunately I'm not sure how I could test that without having a time machine.

Thanks for reading the writeup!

I put in "fishing, hiking, & skiing" and the #1 recommendation is NYC.

Doesn't work very well for people who don't want to live in the US. For every topic there are exponentially more American meetup groups so even if I insert tons of topics related to my country and only my country the top picks are still in the US.

Actually that's part of what I do manually. However, in Berlin (my favorite city so far) meetup seems to go down. Fewer events in general, fewer interesting ones, and more events where nobody shows up. 2012 it was awesome, but in 2016 fun is hard to come by. I don't know, maybe they increased the payment in the meantime or something or maybe I'm just getting too old.

Funnily enough, Berlin is at the top of my list. (And it's also my favorite city so far.)

If I double-click on the globe, it zooms in, and then I can't rotate the globe anymore. I also don't see how to zoom back out!

Cool! I've used meetup to meet people while traveling and it's gone really well. I've thought about making a tool for that. Did you scrape meetup, use their api, or is there a data dump available?

I used their API, it's actually quite well documented and their rate limits seemed generous to me. Here's a writeup I did for collecting the data: http://cole-maclean.github.io/blog/Meetup.com%20City%20Finde...

Looked up my home city of Dublin, Ireland. Apparently we're number one for "sober activities". Which both surprises and doesn't surprise me at the same time (Plenty of people tired of Irish pub culture using it as an alternative place to find social events)

I can't tell if this is getting hammered right now or I'm not using it correctly. I put a topic in and nothing happens...

It did work for one topic after a while, but I'm not sure if I was just lucky or did something different.

Great idea though, I hope I can get it working later!

It's a pretty large dataset for d3 to be loading, so it can take quite a while. It also seems to struggle on firefox. I have alot to learn about web app optimizations and building for multiple browsers and devices. And generally anything frontend :p.

Try waiting quite a while before interacting with it. An example topic "data science" should initialize the map data eventually.

In my case, Mac El Cap, FireFox will briefly load a topic and immediate reset the page. Safari appears as a giant no-op.

If I put in anything mah jong related it removes all the previous topics and resets the page. I've tried with 'mahjongg' and 'mah jong'. Though 'mahjong' seems ok, it just doesn't pull anything.

I put in all my interests (and current meetup groups) and Brisbane was in the top 10 list. The funny thing is I moved FROM Brisbane to Sydney (for better jobs)

Usually (taken from erf/error based on sample size), something grows proportional to the square root of the sample size. If using raw sample sizes, you can apply a square root to that sample size before using it (meters its impact).

Also, perhaps you could do something like establish an overall average and standard deviation "percentage interested" in each topic, then compare the percentage in a given location with the expected. The farther the percentage from the expected (in either direction), the more that location gets pulled up or down. For example, maybe every location is equally good if you're interested in "breathing air", but then maybe one has a slightly higher concentration, making it more relevant.

Also, as you are combining several interests, you are trying to maximize coverage and uniqueness (maximum number of interests present in the maximum amount, with stronger interests given more weight (though you don't ask anyone to rank interests), such that locations having them getting a boost). That is, one location shouldn't dominate the rankings due to a much higher likelihood of having one interest, while having the other interests being "averagely represented" or worse.

That is, maybe the proper way to combine the standard deviations for each place is through multiplication (take the absolute value, then "multiply in" if standard deviation is positive or "divide in" if standard deviation is negative). This will ensure that below average "satisfaction of interests" divide/lower the ranking score and that positive "SOIs" multiply/increase it.

Also, it's good to pull the ranking score from a mean/center/expectation with each "interest score", rather than just blindly averaging/kludging them in. The standard deviation approach achieves this, but I'm mentioning it explicitly such that you can consider it in the event something other than standard deviations are used.

There could also be something done to boost a city with meetups for a rare interest (the smaller the overall percentage, the more weight a location gets). For example, if 2% overall are interested in Cricket and 20% in design, then a city with an overall percentage of 3% cricket should be boosted more than a city with an overall percentage of 30% design, even though the proportion "above the norm" is the same, as Cricket is "hard to find" or a rarity.

Also, if you could factor in the area of a city, then that could enhance the scores further (5k out of 5 million people in New York means something entirely different than 5k out of 5 million in a sprawling suburbia, as population density makes it more likely the 5K in New York will be accessible).

See what I'm getting at?

Thanks for the detailed response! I think you and others on here are right, there's a better way to do the ranking so the results closer represent what we're looking for. I think I need to better define what question we actually want to answer with this data and then establish the metric that best measures that, and your ideas definitely give some hints for some directions that measurement can go. I especially like the idea of "boosting" cities that have a rare topic the user is interested in. Thanks again for the interest!


amazing representations of the midwest, from that data

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact