Hacker News new | past | comments | ask | show | jobs | submit login
How to generate random geo-coordinates within a given radius (jordinl.com)
94 points by jordinl 29 days ago | hide | past | web | favorite | 52 comments

> On a previous project we needed to show markers on a map to indicate there was a property, but for privacy reasons we wanted to show the marker on the map close to the property but not at the exact location of the property. So we needed to generate random points within a fixed distance of the original point location

This raises a couple of interesting questions.

1. How do you stop someone from finding the actual property location by querying that map a bunch of times and averaging the positions returned?

2. Are there any negative consequences for the people whose property a displaced marker ends up on? For example, if someone is using the map lookup to find the home of a rival gang officer in order to do a drive by attack, and I happen to live down the street, I really don't want that marker showing up on my house.

Rather than a displaced marker, maybe divide the map into rectangular or hexagonal regions, and just highlight the region containing the property.

> 1. How do you stop someone from finding the actual property location by querying that map a bunch of times and averaging the positions returned?

Back when YikYak was a thing, I was scraping the data for university campus through their API and found that their location data was randomized upon access. This meant that popular posts (they survived for longer) would end up producing pretty accurate location data.

In one case, I confirmed this by walking into the computer lab in the building where the Geography courses were located and asked "who was asking for lab help on YikYak?" and one student sheepishly raised his hand. I helped him and then also gave him a lesson on real world applications of the central limit theorem.

> 2. Are there any negative consequences for the people whose property a displaced marker ends up on?

That's definitely plausible: https://splinternews.com/how-an-internet-mapping-glitch-turn...

To answer (1), if you seed the randomiser with a known static value—e.g. the co-ordinates, or a database record ID—then repeat queries will return the same mangled result.

Or an even simpler, more trivial solution would be to store mangled values next to the true values. Then all of your APIs and services can pick the appropriate value for the circumstance.

To make it more obvious that the data is imprecise, I would also ensure that the mangled geo co-ordinates returned with a minimum number of decimal places. Three decimal places gives you a precision of 50–100 meters, depending on latitude. Four gives you 5-10 meters, which is the most you'd ever need for an approximate house location.

The concern that comes to my mind is the size of the randomisation and whether they factor in the size of surrounding properties. The level of imprecision required in a dense city will have little to no effect on farmland properties. I would dynamically adjust the random amplitude so that the circle of confusion covered a minimum number of properties. At minimum you'd find a database that listed median property sizes for a given postal code/zip code.

We used a slightly modified version of the code in the article. Instead of using random numbers we used pseudo-random numbers so the positions didn't change across requests.

Is the pseudo-random number generator cryptographically secure?

Every set of coordinates uses a different seed, so I would say yes.

Not if there is more entropy in the coordinates than the seed...

On 2. the consequences can be real, as in the article linked below where people keep accusing a couple of stealing their phones. I believe in that case their house was located at the centroid of a zipcode and whatever tool was used to map it simply put a pin at the centroid. This suggests that just lowering accuracy is not sufficient to protect innocent parties if that lowered accuracy isn't reflected in the visualization, and an approach like highlighting a region is probably a better way to go than giving a precise but inaccurate location.




In those cases there were negative consequences because low-precision markers were not displaced.

Amusingly, this is the first time that I've seen the Law of Large Numbers being treated as a negative thing.

Any random distribution will be leaky when it is centered around the real location and multiple samples can be taken. Quantization to a desired lack of precision is the answer, nobody will be able to tell a point that happens to be rounded to itself from one that happens to be rounded from close to a neighboring cell.

If for some reason the data needs to appear random, you have to store a fixed random offset/alternative per point so that there cannot be multiple observations.

If for some reason the data needs to appear randomized between repeated lookups, you need to store a fixed offset/alternative per point and then apply an additional random offset per lookup. Repeated lookups will only leak the (badly) hidden permanently randomized stored alternative.

If the rationale is privacy this is probably not the algorithm you want to use, especially if it's possible to force the generation of multiple random points (e.g. by refreshing the map). If you're using a circle with a fixed radius just 3 points would be enough to exactly locate the center. A variable radius could take more, but since the distribution is uniform each additional point would significantly improve the precision.

If you used a fingerprinting library as the seed for random, it would be repeatable per user.

As I mentioned in a different comment, we used a slightly modified version of the code in the article. Instead of using random numbers we used pseudo-random numbers so the positions didn't change across requests.

I imagine its for something like AirBnb where the site doesn't want the users to the location the property so they only way to book it is via the site.

What's funny with Airbnb is that if you are looking at the map in search, it is approximate. However, if you save the property to your favorites and then go to your favorite list and look at the map, you get the exact location.

Some Airbnb engineer just got pinged...

Could not reproduce. I tested with different properties and while it shows as an exact location in the map, it is still being fuzzed.

I regularly use this tool (nodejs) for a list of random coordinates inside a country. Or at least inside any country to avoid the oceans. https://github.com/tsamaya/random-points-generator

Why can't you simply generate two random numbers with one between 0 and 2pi, and the second between the min and max of distance? Now you've got polar coordinates. I'm not sure if I'm missing something or if this is overengineered.

If people are interested I've included a link on how Tinder keeps your location private. It's really interesting.


This will give you a random distribution, but not a uniform one. Concrete description why: you're just as likely to end up 3 units from the center as 6 units from the center. However, there is twice as much space that is 6 units from the center than 3 units from the center.

I thought about this for a little bit, I might actually code up a demo. If you take a random number between the square of min and the square of max, then you make your polar r equal to square root of your random number.

Yup, this works in 2d.

Thanks, makes perfect sense and should have been obvious.

That might work if you then apply a translation to the center point in pixel-space. But what if you need to send latitude and longitude to your front-end, or other display system? How do you translate your 2d polar coordinates to a change in spherical polar coordinates (lat+lon)? For small distances near the equator, it may not matter. Otherwise, it might.

The article could have helped answer your question by providing some images to illustrate the flaws with other solutions that the author tried.

As others have noted, this doesn’t result in a uniform distribution. But what you can do, instead, is generate a uniform distribution over a square centered on the point of interest, and simply re-do points that land outside of the circle. For most software engineers, this is probably simpler to understand and harder to screw up than implementing the Haversine formula. :)

Which has the unfortunate property that it isn't assured to terminate. Although I believe that discard-and-reattempt strategy is used in real-life RNG libraries if you ask for a random integer in an awkward interval.

That doesn't create a uniform distribution over a disk. Heres a quick image (https://imgur.com/a/xHEpVB9) I generated using that method.

Buffer your point to create a polygon then use ST_PointOnSurface (or equivalent) in any competent spatially-enabled DBMS to create a point guaranteed to intersect the bounding polygon. Trivial in SQLite, PostGIS, etc.

how is that generated point random in any way?


GIS buffer polygons were magical when I first stumbled upon them.

Something wrong with them nowadays?

Leaflet.BlurredLocation https://github.com/publiclab/leaflet-blurred-location - A Leaflet-based HTML interface for selecting a "blurred" or low-resolution location, to preserve privacy.

H3 is similar and cool, but this builds a similar nested grid concept around latitude/longitude, which can map to existing coordinate systems and database storage.

Also note that population density affects how effective a given precision "blur" is for an area; we're also hoping to develop a web service or library to roughly return population density for any given region of the globe.

Thanks folks for the great links and articles related to the question - very helpful!

Converting lat/longs to Uber's H3 [1][2] system might prove useful here. Because H3 indexes are hierarchical, you can easily truncate them to obtain less precision. And then convert them back into lat/longs after truncation.

[1] https://github.com/uber/h3

[2] https://eng.uber.com/h3/

In practice, would it be easier to just randomly generate points within the bounding box of the radius, and then throw out any points that fall outside of the circle? Wouldn't work too well for the cases where you want points from some radius away, to some radius away, but would probably work well when you just want a random point inside of a given radius since only 1/4 of the points will have to be thrown out.

As far as I can tell, all the complexity comes about from the Haversine formula, which maps a circle to a spherical surface. If you were to treat that as a flat plane, I don't see why you couldn't just generate a radial angle and radial distance, then convert to an offset from your centre. That's trivial to perform.

rand(0,2*pi) for angle, rand(d1,d2) for distance, and then convert from polar coordinates to Cartesian.

Maybe I'm missing something fundamental but it seems more complex than need be, especially for small areas where the distortion of showing a sphere on a Mercator projection is minimal.

edit: Never mind, another comment answered why: This generates a random but not uniform (in area) distribution, because there's a larger circumference (and thus more swept area for a given radial angle) the larger the radius moves out.

A couple of us (marcinzm got there first) stumbled onto what might be a much simpler solution, where we just counterbalance that bias. Perhaps we're missing something, but if so... well, then I missed it :-P


That’s an elegant way to balance the issue, I like it. I guess it all really depends if you want an even distribution on area or in radial distance from the centre, so I can think of valid reasons to use both.

Glad you like it. I really think we're on the right track with it - I was surprised the linked article didn't come up with the same idea.

I don't see much point in an even distribution in radial distance from the centre. If you want to model something like the distance from the bullseye to an archer's arrows, you're presumably going to want a normal distribution on radial distance from centre, which is of course very different (though at a glance they might look similar).

This is a fine submission but can you please not put "Show HN" on blog posts? This is in the rules: https://news.ycombinator.com/showhn.html.

Ah, sorry I missed that...

Over engineered much?

> On a previous project we needed to show markers on a map to indicate there was a property, but for privacy reasons we wanted to show the marker on the map close to the property but not at the exact location of the property. So we needed to generate random points within a fixed distance of the original point location.

At those distances, you can treat the surface as a plane and calculate points on a circle. Sure, it won’t be accurate for large distances but that is not a requirement. Nor is accuracy for that matter since the whole point is to obfuscate the original point.

Seems like another option is to generate points inside a bounding square and throw them away if they are outside of the circle.

Yeah, and it doesn't even need to be within a certain radius - just make it within a certain latitude / longitude.

I was thinking the same thing. Just generate two random numbers in polar coordinates and then map back to a cartesian offset. Only trickiness is taking the square root of one of the numbers to generate the radius rather than the random number itself.

This is the solution I arrived at, too: explicitly balance out the bias, which seems to be as simple as just taking the square root. Surprised the whole thread isn't full of that idea.

Long rambling version:

The bias in the naive 'solution' is demonstrated in that if we consider a circle of radius 0.5, it has an area of 1/4 that of a circle with radius 1, not half. Which means that as we span our random parameter between 0 and 1, we 'spend too much time' in that small 'inner circle'.

(It's easiest to assume a circle of radius 1, centred about the origin. Nothing interesting arises from additional generality.)

So we don't want the displacement from the origin to rise linearly with the uniform-random parameter. Instead we want the area of our 'inner circle' to rise linearly. Which means that the displacement from the origin should be the square root of the random parameter. (The random parameter's interval will have to be adjusted appropriately of course.)

Our 'theta' has no bias trouble. Transforming from number (angle) into vector/coordinate form, is just routine trig. Nothing interesting to worry about there.

All that's left is to be careful that our theta span the interval [0, 2 * pi) (carefully excluding 2pi), whereas our area should span [0, pi * r^2] (inclusive, assuming that our 'circle' is meant to be an 'open ball' rather than a closed one).

Either we're missing something, or this Haversine business is a needlessly complex answer here.

I considered that another approach might be to define a spiral that 'winds through' all points in the circle, but I'm pretty sure that's not possible. If it were possible, it would give an injective space-filling curve, which is known not to exist. (If it were not injective, it would presumably be biased and so be a non-answer to our question.) Put another way, if your answer involves accepting just one random number, and not two, then your answer is necessarily wrong. (I think I'm getting all this right.) [0]

Another thought: the naive solution is biased despite being injective. Interesting. That seems to be the root of the failure of intuition that leads us to the naive (biased) 'solution'.

[0] https://en.wikipedia.org/w/index.php?title=Space-filling_cur...

Ah. I now see the blog post is about spheres, not 2D space. It could have been more clear about what problem it was setting out to solve.

I wrote an app where we needed to get a list of properties within a radius of a given property, using lat/long coordinates. Given the search radius would be at most 5 miles, and didn't need to be 100% precise, I just treated the surface as a plane. A trivial solution to a trivial problem-- but it is neat to know the details of the formula for doing it precisely should I ever need it.

>calculate points on a circle

This is a bad idea. If you don't randomly distribute points within the entire radius then you've just narrowed the search space for the real location from the entire area in the radius to just a circle around the point you give back.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact