Hacker News new | past | comments | ask | show | jobs | submit login
The lat/lon floating point delusion (datafix.com.au)
149 points by eaguyhn 68 days ago | hide | past | web | favorite | 149 comments

Yes, why don't people spend their time doing the extra work to figure out the exact number of meaningful digits in their measurements when the defaults work just fine for their non-scientific, non-metrological purposes?

A mystery for the ages.

Because the value, as presented, is wrong. "[but it's being used for] non-scientific, non-metrological purposes" which doesn't change that it's the wrong value.

Imagine, for a moment, that someone uses these values to put up a fence. Is the fence on their property? Those crap values beyond the actual significant digits of the measurement may tell you that you are, when you're not. That will cost you time and money down the road.

> Because the value, as presented, is wrong.

That's not true nor is it the problem at all. The thesis was that the values, albeit correct, have too many significant digits, which in turn reflect differences which lie somewhere between having no practical use or being absurd.

> Imagine, for a moment, that someone uses these values to put up a fence.

That example is very poor. Any engineer can tell you that a measurement is meaningless without tolerances/margin of error, and the tolerance in effect when putting up a fence is not expressed in a microscopic scale.

> Any engineer can tell you that a measurement is meaningless without tolerances/margin of error, and the tolerance in effect when putting up a fence is not expressed in a microscopic scale.

But these locations are very precise, indicating high accuracy/low margin of error, but they don't have that high accuracy. In fact, the error could be in meters, rather than in the implied millimeters, so it definitely could be a problem for a fence.

Here is a college textbook chapter detailing types of precision and accuracy errors in GIS systems: https://www.e-education.psu.edu/geog469/book/export/html/252

"GIS users are not always aware of the difficult problems caused by error, inaccuracy, and imprecision. They often fall prey to False Precision and False Accuracy, that is they report their findings to a level of precision or accuracy that is impossible to achieve with their source materials. If locations on a GIS coverage are only measured within a hundred feet of their true position, it makes no sense to report predicted locations in a solution to a tenth of a foot. That is, just because computers can store numeric figures down many decimal places does not mean that all those decimal places are "significant." It is important for GIS solutions to be reported honestly, and only to the level of accuracy and precision they can support."

Or in other words, don't build a fence relying on naked coordinates of high precision but unknown accuracy.

> a measurement is meaningless without tolerances/margin of error

Reporting a value with too many significant digits is the same as reporting an incorrect margin of error, which (arguably) makes the value wrong.

That's certainly true in a scientific paper, where the concept of significant digits is relevant and widely recognized. In a "please remove dead animal" web request, the value is only wrong if it points to the wrong location (i.e. a location that does not contain a dead animal).

It's wrong if you ignore context -- but then, everything is.

The context is that it's in a database serving up business locations. How is the geographic area owned (or occupied?) by a business reduced to a point? They don't say, so it's clearly unreasonable to expect accuracy and precision beyond, say, property boundaries, and perhaps not even that.

It would be madness to infer that the extra digits denote precision beyond the crude bounds set by context.

It would be further madness to reverse engineer a property boundary -- an entire path of points -- from a single point in a database.

You can click at arbitrary positions on those maps and get 6+ sig-fig values for those positions. And since the map displays a satellite representation of the property, is it really that extraordinary that people might consider it to be sufficiently precise enough to put in a fence?

FWIW, the error margin for 6 significant digits is around 11 cm. Accurate enough to point at a computer mouse.

The 2D pixels in the projected satellite image will never map to a precise gps lat lng coordinate. These gps coordinates are also just a rough approximation of dead reconned coordinate systems that actually define the parcels.

You can easily try and overlay county parcel data into google earth and see how many features don’t align with the sat imagery.

No? Then why do they appear to? Sure, a professional should "know better", but will a lay person who has the official GPS coordinates of their property lines (obtained at the time of sale)?

That's why people get concerned about this; that's why posts like this are important.

If the page in question listed something that could be construed as a property boundary, sure.

It doesn't.

Every new home owner gets a "plat" that outlines the boundaries of their property, including GPS coordinates. Using a tool (like Google Maps) to overlay a coordinate with a ground feature isn't exactly absurd.

What does that have to do with the Melbourne Open Data Portal?

If you want to argue that Google Maps could do more to highlight the accuracy/precision limitations of its projected imagery, I'd completely agree -- but that's a different argument.

A lot of these kind of coordinates have no right answer anyway. They are just an abstract approximation of something else, like a parcel, address, or place to eat. And the lat/long will be completely useless for the vast majority of people.

In any serious application you should have known error values and probably be using a more appropriate projection. If you need to accurately locate something you need someone trained to do that with the right equipment.

They aren't approximations; you can verify this yourself. Pull up Google Maps, pick an arbitrary spot with no businesses, and click. You will get at least 6 significant digit coordinates.

No, I meant that it is an approximation or the thing in the real world that it represents. For example, consider a restraunt. Should the point be at the entrance to the building, in the middle of the building, or maybe at the door round the back where mail is delivered, or at the centroid of the parcel. In order to actually use this excessive precision you would need to know that. In the examle cases the data is not formal enough for this to matter. It is an approximation of some real world thing and never meant to be used for highly precise work.

Then the coordinates presented should match that approximation. Anyone who has worked with plans knows that the number of significant digits presented on the plans has a meaning; it represents the margin of error.

Online maps aren't respecting this, they're giving decimeter precision values on maps that are lucky if they're accurate to within meters, if not tens of meters. That's the problem.

I think that makes sense on plans that are using a proper coordinate system and have sub-metre accuracy. But removing digits further just causes confusion. You are just using the grid ref as a proxy for something else, like a circle or square which represents the error. Except in these examples it doesn't matter. No one actually needs the data to be super accurate. The coordinate length is just a style thing.

Are you working with plans that show decimal degrees at a specific precision? I am curious what domain that is.

These values are as right as setting least significant digits to 0 would be. The sensor has limited precision and the other digits might as well be 0 or 584985948 and the value would be no more precise.

I would never assume the displayed digits are significant in any non-scientific context.

People use significant digits colloquially, though with a lot less rigour. For example, If I say I'm 6 feet tall, people will probably guess that I'm between 5 ft 10 in and 6 ft 2 in. If I say I'm 5 ft 11 in, people will probably guess that I'm between 5 ft 10.5 in and 5 ft 11.5 in.

Yeah, but "-37.80467681 144.9659498" has no meaning colloquially to almost anybody, because normal humans don't think in latitude and longitude in the first place. To me, those are just values to copy and paste into a maps app to see where something is, and if it's pointing to more or less the right location, I'm happy.

They're like the physical-world equivalent of a URL; I don't honestly care if there's an unused query string parameter tacked onto the end of it or if it redirects to https or something; all I care about is that the link I clicked on gets me to the correct cat video.

If you're given 8 significant digits, how many of those are correct? 3? 4? 5? The difference between those three is the difference between a position on a street block and the position of a person in a room (to quote XKCD). At least with 0's, you can approximate how accurate the value is. With garbage, all you can do is hope you've guessed correctly.

That is a misunderstanding of floating point.

The relationship between trailing non-zero digits in the decimal representation and trailing non-zero bits in the significand is not one-to-one.

One third in a trinary representation would be 0.1. In decimal, 0.33333333333...

Mostly fruitless overprocessing. Only time I'd consider doing this would be for an epic ASCII dump of coordinates.

> Is the fence on their property?

Even if the values were precise, the fence can never be precisely located. Land/parcel demarcations don't work that way. Also, GPS coordinates change over time. Or rather, the land underneath moves.

Fence or other land locations are always relative (to an official surveyor's point), not absolute.

Rewinding a little:

> Imagine, for a moment, that someone uses these values to put up a fence.

There's your first problem.

Me, I tend to just assume others have had the same talk in year 8 or whatever about accuracy vs. precision.

Significant-digits seem like schoolroom drudgery until, one day, there is enlightenment.

They're a useful tool, no more, no less.

Using a useful tool to do something useless doesn't make the useless thing useful.

Sometimes significant digits are too primitive a tool for error communication, and you might have to use the parameters of a normal distribution, some other distribution, or even a histogram. Other times significant digits are too cumbersome because it would take time to discover the precision in a situation where nobody actually cares beyond a crude threshold. This is one of those other times.

That day may be today, if you enlighten us...

I like your argument, but it would be nice to keep things civil.

This is the civil version, with mild sarcasm.

People have this idea that when you take a measurement, you have so-and-so number of significant figures that are probably correct and the rest are just pure noise. But that's not how measurement error works. In the real world, physical measurement errors are more-or-less normally distributed (we don't have to argue about the "more-or-less" part because my argument here holds for any distribution other than a uniform one). Let's say your measurement gives you a latitude of 45.73490534578° with a standard deviation of 0.00001° (that's about 11 meters). Those last few digits of your measurement are almost certain to be wrong. But does that make them meaningless? No! Because if your measurements are unbiased, then slightly more than half the time, 45.73490534578° is still going to be closer to the true value than 45.7349053457° is. By performing significant figure rounding, you aren't throwing very much information, you may not be throwing away any information you care about, but you are nonetheless throwing away information.

The problem is that these lat/lon figures are not displayed with precision information at all, so we can't even know the std dev. Sig figs are a simple and clear way to communicate the precision of a measurement, but if you want to be more statistically accurate, you can use parentheses to indicate the standard deviation. In your example (if I mess this up please correct me, but I think) it would be 45.73490534578(1000000)° Which again is silly because there is no way to know the standard deviation to that level of precision. A more reasonable number would be 45.734905(10)°

Is it difficult to include precision when reporting measurements? No

Is it sometimes valuable? Yes

Is it really too much to ask for? No

I've never seen the convention you're using and I don't think I understand it. Conventions I've seen include:

* Give an error bound like 45.73490534578° (±0.00002°) and indicate in prose that this is a 2σ bound.

* Put non-significant figures in parenthesis, like 45.73490(534578)° (EDIT: possibly I've misinterpreted this one when I've seen it, see logfromblammo's reply)

* Put a bar over the last significant figure, like 45.73490̄534578 (hopefully this one renders properly when I post this... (EDIT: nope))

The value in the parentheses is the symmetric one-sigma bound.

If I say the atomic weight of F is 18.998403163(6) g/mol...

  mean     18.998403163 g/mol
  std.dev.  0.000000006 g/mol
If I say Planck time is 5.391245(60)e−44 s...

  mean     5.391245e−44 s
  std.dev. 0.000060e-44 s
The standard rules for rounding imply that whenever a measurement is given to a certain number of significant figures, you're leaving out "0(5)" from the end. So 1.2345 is 1.23450(5) in parenthetical notation.

Significant figures rules give you a close-enough propagation of error, but in order to be more exact, you need to combine absolute uncertainties when adding or subtracting, and combine relative uncertainties when multiplying or dividing.

The idea of using something like 45.734905° ± 0.000010° is that the information of the other digits is so small that it's better to ignore it. (Assuming that it is unbiases, and the other digits have some information at all.)

An intuitive way to understand this is considering how much work is necessary to extract information from the next digit. You can use statistic to extract more information. As a rule of thumb you need 100 times more measurements for each additional digit. So you need something like 100 (or 400) for the first digit, 10000 (or 40000) for the second digit and so on. (There is a constant here, I never remember the constant, perhaps it's 4, perhaps it's 1.)

To extract some information from the last 8 in 45.73490534578° you need 1000000000000 measurements! So it's better to just ignore the tiny amount of information in the 8.

In the lab in very controlled scenarios you can repeat a measurement very carefully a lot of times automatically and then use statistics. But if you have a handheld GPS, you can't repeat the measurement more than a few hundred of times.

A device that use a somewhat similar process is the https://en.wikipedia.org/wiki/Lock-in_amplifier it is not exactly this statistic trick, but note that it needs an stable environment to repeat the measurements. Wikipedia says that it can detect a signal 1 million times smaller than noise, but IIRC (the cheap ones?) can only detect a signal that is 1/100 or 1/1000 of the noise..

PS: Please never use 45.73490534578° (±0.00002°). If you use in a lab in the university, the TA will get mad at you. You can try using 45.73490534578° (±0.00002000000°) that is bad but not so horrible bad, the TA will still get mad at you but you may survive.

The brackets in the table mean that different sources of the element have different proportions of isotopes, so the mean atomic weights for specific deposits may cover a range that differs from the overall mean for every deposit of the element ever measured.

I.e. if you measure carbon from the upper atmosphere, it's going to have more C-14 in it, from cosmic rays flipping protons in N-14 to neutrons. And if you measure carbon buried for thousands of years, it's going to have less C-14, from natural decay.

If you look at https://en.wikipedia.org/wiki/List_of_physical_constants you can see that the parentheses are omitted from defined constants, and included for measured constants.

The parentheses show something similar to your first, where this is the +- read from the right hand side of the number. So you could write yours:


The question isn't about mathematical rigor. It's about utility and distraction.

In much of the world, phone numbers work even if you add extra numbers that aren't needed. So I could give you my phone number plus 10 extra digits that change nothing. The end result would be the same utility to you (you can contact me) but with an increased cost in recording, memorizing, chance of error.

Using lat/lon to an unnecessary level of detail is the same thing. More digits are more chances to make mistakes, more cognitive load.

Exactly. Extraneous, useless, information is not no-value, it's negative-value, because of this.

> slightly more than half the time, 45.73490534578° is still going to be closer to the true value than 45.7349053457° is. By performing significant figure rounding,

That's not rounding, so I'll assume you meant "slightly more than half the time, 45.73490534578° is still going to be closer to the true value than 45.7349053458° is".

And that may be true, but if 50.0001% of the time one is closer to the true value than the other that's essentially meaningless.

> if your measurements are unbiased

This is a big IF which you can't actually know because its beyond your level of precision, by definition.

This is completely knowable, by taking repeated measurements of a reference object, one which was either checked by a more precise instrument or is definitionally correct (e.g. the old reference kilogram or the Greenwich meridian)

In the time it takes you to note down those extra digits, you could have improved your actual measurement by the same minuscule fraction of a bit ten times over.

In contrast to his 4-digits-are-fine notions, let me offer a counterexample: http://scissor.com/transient/destination_heatmap/

I am working on a Twitter bot called @sfships, which monitors the comings and goings of large ships in the San Francisco Bay: https://twitter.com/sfships

As part of that, I generated the above map from AIS data. [1] It's basically where ships stop. If you zoom in on the San Francisco waterfront, you will see a grid of dots. That's because the AIS protocol stores lat/lon as minutes/10000 [2], throwing away more detailed information.

This is adequate for its initial intended purpose, which is putting ships on radaresque displays so that ships don't hit one another, etc. But it produces all sorts of artifacts and issues when one tries to use the data more broadly.

And in case anybody is interested in playing with this, I have written a python parser for AIS data with a bunch of Unixy command line tools, including one that just turns the weird 90s protocol into more modern JSON: https://github.com/wpietri/simpleais

[1] https://en.wikipedia.org/wiki/Automatic_identification_syste...

[2] https://gpsd.gitlab.io/gpsd/AIVDM.html#_types_1_2_and_3_posi...

I used to have a job where I analyzed global ship tracking data (AIS and other less-open (i.e. more classified) data streams) for law enforcement and environmental conservation purposes, and learned one thing: man is AIS data a bitch.

This is especially true when you're dealing with potential bad actors who can spoof their transmissions, combined with incredibly poor vessel metadata databases and strange satellite AIS coverage issues.

If there are any data scientists out there looking for a real head-basher, start an AIS project.

That sounds really interesting. And if anybody is doing an AIS project, feel free to get in touch, whether or not you're using my library. I have soaked up all this esoteric knowledge that I'm eager to pass on.

Why do bad actors spoof transmissions? Smuggling?

Many reasons, and a significant percentage of all AIS transmissions are dodgy. Smuggling is just one reason.

Other common reasons include illegal fishing in protected waters, covert meetings (e.g. a couple yachts coming together in the middle of the ocean), spy ships by state actors trying to look like ordinary commercial traffic, and misrepresenting state of shipping assets for commercial leverage in contract negotiations.

Yes, these are all things we observed in our data.

I would love to read more about this. Any suggestions?

Much of this kind of thing is written up in pretty terribly-written documents that come from government/law enforcement. Here's [0] a report I helped to put together which circles around some of these issues.

[0]: https://ocean.csis.org/spotlights/illuminating-the-south-chi...

These are things you learn by doing in-depth analysis of AIS and related data sources. As far as I know, no one writes about it. Physical world data sources are full of interesting patterns and anomalies. Few people ever look because the data sources are challenging to work with and very different from Internet data, which is the only kind of data most people are used to.

Well at this point I have terabytes of AIS data from the last few years, so if anybody is looking to dig into this, I'm glad to share.

I'm sure Tony knows more, but one I've heard about is fishing. It's not legal to just fish anything anywhere. Poachers have an incentive to avoid monitoring and enforcement.

Another example: I was looking up my property details yesterday, and wanted to get exact coords for my property line. Their web app lets you click on the map to get coords, but it only displays degrees/minutes/seconds (with no decimal point), so it wasn't much use.

Are you certain that the data is accurate enough to support better precision? Where I live we have avcess to vectorized property maps, but the data source may be a digitized 200 y/o map.

My mom used to do a lot of work with property maps for oil and gas companies and at least in the US they were very precise. They had to be, as you don't want to build an expensive well and then later have to move it 3 feet. Or, worse, realize you didn't buy all the mineral rights and then have to pay somebody.

> If you zoom in on the San Francisco waterfront, you will see a grid of dots.

And yet, some of those dots are significantly further inland than rounding can account for. There's one faint point near 4th and King, for example, and a strong one on Spear between Howard and Mission. All signs point to the source data being imprecise enough that transmitting it at a higher precision wouldn't improve the results.

I wonder if introducing some randomized jitter would combat the regular/grid-like placement of the dots.

I find it hard to get that worked up about this. The positions are accurate at least, and if you're really so hard up for space that saving a couple of digits is going to make all the difference then you can think about it for 5 mins and choose a precision/storage trade off that's appropriate for you.

The representation of the numbers doesn't convey any information about accuracy, and it's at least a good practice to store data in a way that conveys the precision it was gathered with.

The issue then is that there is no reasonable primitive format to store a “real number with a small decimal precision”. Arguably storing these as ints would be better but then everybody would need to agree on the denominator.

It is frustrating (to me, at least) that there is no common primitive data type the stores precision. But it isn't rocket surgery to store two floating-points (a value and precision) in a structure and format it properly for display.

Are you proposing doing away with IEEE 754? That's probably the origin of all these extra digits.

Not at all. I'm suggesting people think about what they are doing. Blindly sticking numbers into a format that is designed to be computationally efficient is not a great way to store data. That doesn't mean that a computationally efficient representation is useless.

And I'm suggesting that using the built-in floating point representation is a good thing overall, even if it doesn't allow you to specify a precision.

I'm confused by this reply. Like I said "Not at all.". But the point is, if you have some data that you want to communicate, it is also a good idea to store the precision.

It's actually very rare that you need to know the precision of a number, and it's thrown away as soon as you read the value into a standard floating point variable. Keeping the precision is usually more trouble than it's worth.

Not just lat/lon. I find myself amused when The Washington Post's education columnist Jay Matthews--quite a sharp guy from all I can tell--runs his high school "challenge index" (AP tests taken / size of graduating class) out to six decimal points for small schools. Somebody give that man a slide rule.

When I see those errors, I often try to work out the exact input figures that led to the stats reported.

At work, I recently saw a survey result from a small customer base reported as "31.3% of respondents" rather than the more communicative "5 out of 16..."

Quite. That way I was able to figure out how many students graduated from Washington International School that year.

For that matter, my wife taught a class one year at a local university. The class had 29 students, or at least 29 who bothered to fill out the evaluation, and the evaluation had a couple of places right of the decimal point. I found myself going reckoning, "hmm, 27 of 29 though that ...".

Here in the frozen north, newpapers^H^H^H^H^H^H sites insert metric conversions everywhere. “I must have lost ten pounds [4.5359237 kg] on that run!”

I absolutely love it when I read something like "The object was nearly 1 foot (30.48 cm) long."

“I wouldn’t touch that with a 3048 millimeter stick.” (A.k.a. a ten foot pole.)

I wonder when Reznor is going to make the next 22.86 Centimeter Nails album.

(Or is that Centimetre? Probably a metric/Imperial thing... )

That "index" is terrible. It claims to be a ranking of all schools:

> Schools ranked no. 220 or above are in the top 1 percent of America’s 22,000 high schools, no. 440 or above are in the top 2 percent and so on.

But he only has ~2,000 schools in his database, or less than 10% of all schools.

I didn't really get significant digits until I played with a slide rule.

It doesn't matter how precise the value is, as long as it's precise enough for your use-case and as long as machines are handling it. If the value is displayed in a UI and it's rounded off to a reasonable decimal there, I don't see the point why you would get so worked up about it.

I'm sure my thermostat internal temperature representation is way more precise than the 0.5 degrees precision it's showing on the display.

As the article discusses, much (though not all) of the issue he has is with the display value in UIs or published data.

The author also ignores the fact that the distance associated with a degree of longitude changes dramatically as you go from the equator (69m/111km) to the poles (0) - roughly as a cosine. So the number of decimal places of longitude should shrink as you get further from the equator.


"At 40 degrees north or south, the distance between a degree of longitude is 53 miles (85 kilometers)."

So while it's appropriate for Melbourne (37.8S) to use (very slightly) lower precision than Sydney (33.9S), they're both using too many digits.

Yes, this is what makes it tricky. The units are very strange here. Its not a simple unit like meters or liters. Its not even one of the stranger units like Celsius (which is odd since zero C does not correspond to none of the quantity it measures, which is thermal motion). Its a relative unit, so its hard to judge how many decimals you need to be accurate.

Its almost like if I was to record where an astronomical object was using only time. Assuming you have full knowledge of its past trajectory, depending on how fast that object moves, you would need different precision times for objects traveling at different velocities, to get the same precision in space.

This is why you just give up on the problem and report to a precision that always works, instead of using the standard scientific procedure of significant figures. I guess if GPS measurement devices took all this into account they could report their measurements with the correct number of digits (taking into account measurement error, and the absolute position on Earth). I guess 15 digits is always too many, but whatever.

Err, no, that fact is in the article.

Went back and re-read, still not seeing it there.

It is described in the linked wikipedia article, though: https://en.wikipedia.org/wiki/Decimal_degrees, so it isn't completely ignored by the author.

I usually go with 6 decimal places.

Geojson format advises it also


> For geographic coordinates with units of degrees, 6 decimal places (a default common in, e.g., sprintf) amounts to about 10 centimeters, a precision well within that of current GPS systems. Implementations should consider the cost of using a greater precision than necessary.

If you actually read the RFC, it's giving a common example as an artifact of sprintf, and explicitly stating that implementations should consider how much precision they need...

Agreement on the decimal places. But honestly, why would you commit to using GPS as the end means of locating in Australia when you'll be noticeable off in 20 years and outright inaccurate in 100?


Using GPS/WGS84 to locate is fine, you just need to convert to a plate-local datum before long-term storage of those coordinates.

I wonder why they don’t mention that Australia’s new datum (GDA2020) will be rolled out next year.

Here are more details: https://www.google.com/amp/s/theconversation.com/amp/austral...

And on their official web site: http://www.icsm.gov.au/datum/gda2020-fact-sheets

What do you have in mind that is better than GPS?

Professional geospatial systems frequently use fixed point internally but export floating point for convenience. The Internet treats all geospatial as floating point but positional measurement systems and mapping base layers often come from fixed point data models.

Many of the underlying fixed point systems are designed for meter precision, with some newer systems going down to a centimeter. As a practical matter, repeatable positioning on the Earth's surface becomes difficult below 10 centimeters of precision, so centimeter precision is widely viewed as the physics floor.

There are large networks of GPS monitoring sites that regularly report their positions, including movement, down well below centimeters.

The point is about repeatability, not precision. Nominally fixed points on the Earth's surface are constantly in motion relative to GPS and relative to other fixed points. Many of the effects that cause this lack of repeatability can induce centimeters of measured deflection from the mean position over relatively short periods of time. There are models that can be employed to correct some of these effects (e.g. fluctuations in the gravity field, which is measured by satellites) but not all of them. Consequently, the measurement may be precise but there is never a reference frame in which the points are actually "fixed" over time, which is indistinguishable from reduced precision. Local changes in surface geometry are also easily detected with LIDAR.

For surveys that require maximum repeatable precision, they will often use the mean position as measured over e.g. 12 hours. The magnitude of the variance varies quite a bit depending on where you are on Earth.

I've done a fair amount of work with GPS and precise positioning, and I'm aware of all the dynamics of earth, orbit, and electron path that are involved. I've never before seen a claim of 1cm being some kind of physical limit on position.

There are both short and long-term motions of the earth. All are sufficiently measured and modeled such that repeatable measurements are very much achievable well below 1 cm.

In fact, it is GPS that is used to develop the models of earth movement. So I don't quite get what you are saying.

The centimeter limit is an engineering heuristic used by people combining high-precision GPS, LIDAR, remote sensing, and sometimes other sensors to build high-precision models of the world e.g. maps for autonomous systems. LIDAR alone can routinely detect changes in relative local surface geometry in excess of a centimeter across a day in practice. Autonomous driving typically uses an error bound of around 10 cm on absolute positioning even though the sensors used to determine that positioning often have sub-centimeter precision, for similar reasons. Real-time localization is an interesting problem because measurement with orthogonal sensor modalities can produce conflicting results with differences greater than the nominal precision of those positioning algorithms. Getting the last few centimeters of functional precision out of contradictory measurements is one of the things you need AI for.

I agree that parts of the world have repeatable measurements below 1cm if you can account for the myriad effects that cause displacement, but others do not. The models are also approximate, since some effects require real-time measurement of things we do not have real-time measurements for, or which are not practically available in contexts where local position needs to be determined.

Did the author think about the implementation? If we assign semantic meaning to the number of decimal points, now we also need to store the number of significant digits side by side with the actual number. We shouldn't round off internally because we might use it in as an intermediate result in a calculation.

> We shouldn't round off internally because we might use it in as an intermediate result in a calculation.

Only if the precision is meaningful. The author's point is that it isn't for geographical coordinates.

Error tends to grow as the calculation proceeds. 6th-decimal "insignificant rounding" done over and over as the calculation proceeds can grow to become arbitrarily large.

If you're doing calculations with geographical co-ordinates that are so unstable that a nanometre position difference in the original makes an arbitrarily large difference in the result, your results aren't sensible anyway.

What numerically unstable algorithms are ever performed with geolocations?

Moreover, given the author's point that real measurement errors exceed the false precision of published data, if such a calculation were performed and did provide "arbitrarily large" error, it would indicate that the result should in fact be nonsense.

Vincenty's formula has some instabilities/failure to converge at certain areas of the globe.

[1] https://en.wikipedia.org/wiki/Vincenty%27s_formulae

I'm thinking about the case where a calculation extends across many roundings.

That doesn't necessarily make the algorithm unstable. See for example:

  >>> import numpy as np
  >>> x0 = np.random.rand(1)
  >>> x_rnd = np.round(100*x)/100.0
  >>> for i in range(1000):
  ...  x = np.mod(x+np.pi,1)
  ...  x_rnd = np.mod(x_rnd + np.pi,1)
  ...  x_rnd = np.round(100*x)/100
  >>> x_real = np.mod(x0+1000*np.pi,1)
  >>> print(x_real,x,x_rnd)
    (array([0.51013166]), array([0.51013166]), array([0.51]))
Note no loss of accuracy for all of the intermediate roundings. The accuracy is preserved because there's no mechanism here to amplify the error.

That's why I ask about the kind of algorithms typically applied to geolocated data. Off the top of my head, I can't think of anything that would be both useful and error-amplifying.

Which calculation on lat/lon coordinates would make an atom-scale error grow, say, a million times to millimeter-size?

"Build the gravitational wave observatory between the Thai restaurant and the railway station"

You joke but a lot of radar/radio observation is done with ECEF coordinates, which is a (lat, lon, z-axis) triple.

ECEF is a Cartesian (x,y,z) not spherical (𝞺, θ, 𝜑) coordinate system. Units in ECEF are in meters from (0,0,0).

Blah, shows how much I remember about it. We always converted to ENU locally to start with, hah.

You don't need to store anything extra if you are just filtering down to a standard number of digits on output.

I would consider storing (as part of the database) or defining (as part of the display code) the number of significant digits for coordinates part of good GIS design.

It would be nice to see a cloud of possibility around the map marker, similar to how your own "I am here" dot has when your phone doesn't have a good fix.

I recently ran into a weird error concerning Lat/Lngs and floating point numbers. Google has a service for retrieving imagery from Maps, and you can even request that imagery have certain paths drawn on it. You encode those paths as part of your request URI, but if you have a lot of points, you could end up exceeding their maximum URI length restriction. So they also define a hashing system for compressing those points into a single value, which is defined here: https://developers.google.com/maps/documentation/utilities/p...

One of the things they don't mention is that the rounding step does not work as they expect for single-precision floats. You have to use double-precision floats to get the same results that they are demonstrating in the example.

They are asking for 5 values to the right of decimal points, which, with the maximum of 3 digits to the left of the decimal point, means a total of 8 significant digits. Single-precision floats should be good to 9 digits, but the rounding step is off by one when using singles.

The solution was to cast singles to doubles before performing the rounding. Which seems absurd.

Single-precision floats should be enough precision for lat/lngs anywhere on the Earth, for everything other than some applications in commercial-grade surveying. And if that is the job you're doing, you shouldn't be using Google Maps.

These kinds of issues are what we've solved at QALocate.

There are a number of different issues here, which I think are only partly explored through the article.

Let's take as given that you need to direct a person some place. In the article, they are directing someone to a restaurant. But this gets complicate fast. Is this person a patron rather than, say, a deliver driver there to pick up an order, or a plumber there to fix and appliance, or an inspector there to observe the kitchens, or ...?

By using a lat/long, or any geo-coordinate, we lose the human-value of context. Each of the folks I list above has a very different place they potentially need to navigate to. And even if their destination is the same, the routes they took may NOT have been. Where they park, or were dropped by ride share, and which doors they use are also influenced by their role.

Using a geo-coordinate to drops the rich meaning that humans in all their roles require. A better solution is to use a real identity and then, when and where coordinates are needed, derive them based on the persons role, whether it be patron, plumber, or paramedic. As roles change, or as the structure itself changes, then directions change to meet them. I find this a much richer solution than just blindly telling my maps app to direct me to some GPS location.

Another issue hinted at but not deeply explored is that often what we want to specify is a region or volume, not a coordinate. Things in the real world consume volume. Coordinates are idealized points and do not. This may seem a trite observation, yet, the majority of our tools think in terms of points. With the rise of autonomous vehicles, especially drone delivery, we need to move towards representations of regions and/or volumes, depending on needs. I'm not convinced that geopolys are quite right for this task and instead we need something that is comfortable with the fuzzy and complicated boundaries humans have to deal with.

[1] http://www.qalocate.com

Is the Joe Pesci picture a joke or has your website been hacked?


HAHAHAHA! Having not met Joe I don't actually know, but I presume that is a legit picture.

I think the issue is that people don't have an good intuition for how many digits past the decimal point are relevant in latitude and longitude. "42 degrees north" is obviously imprecise. 42.123456 is precise down to a millionth of a degree; but if you wake me up in the middle of the night and ask me how long is one millionth of a degree of latitude at sea level, I wouldn't just be able to blurt out the answer.

Turns out it's about 11 cm.

A millionth of a degree is about 1.75E-8 radians. For small degree values, sin(x) = x, so we can just multiply that figure directly by the radius: ~ 6400000 m * 1.75E-8 = 0.112 m.

On the flip side - who cares? It's just a question of shunting data around - ok someone could add a format specifier somewhere and it would be neater. But there's no particular harm in it either.

Sending more data than you need is bad for two reasons.

1. It implies more precision, which in many cases is a flat out lie. Savvy users will know to truncate, round or something, but others will be lead astray.

2. It takes more space to represent. If the engineers storing double precious floating points had asked themselves if they needed to be sending these out at such a high resolution, they may have been able to cut the storage of these things down 50%.

Though yes, there are probably bigger issues to worry about. Doesn't mean we should outright dismiss these points.

From a graphic design/UX perspective, it looks subjectively worse for no benefit. As a developer, I would happily do the 5 minutes of work to determine how many digits were appropriate and display only those digits to minimize the visual noise.

The presentation is just lazy and javascript defaults to double. No mystery.

You can use digits to indicate accuracy but if you don’t know whether the measurement stored as a double is accurate to 2, 4 or 12 decimals then you can’t really do much in presentation. The reader has to interpret the number. Usually the reader doesn’t care much about accuracy, and simply copies the numbers to another system. The coordinates with full double precision is then basically a machine readable format.

> Popping those long coordinates into Google Earth, it looks like the georeferencing puts the point somewhere on ILI's back wall

Perhaps the Google Earth image is "wrong". Stitching aerial and satellite images together and aligning every pixel to the exact geo-location isn't as simple as it sounds -- especially given that the ground shifts over time.

And that the aerial images have perspective and optical distortion to deconvolve.

I could explain this. Internally all those coordinates are stored as double (not long double). The precision of a double is max 15-17 digits. Usually it's very inprecise though. But not too imprecise, because tiny errors could lead to very costly mistakes. my biggest did cost ~40.000, wrong after the third digit. 30cm. You might remember the financial stories on microdiffs with improper arith. That why one of the most important lessons I told my students was precision and it's limits. Esp. when zooming in.

CAD/GIS system round to the default vsprint %f precision which is 6. They could use a zoom dependent number but nobody would do that, they would laugh at you. It's not a delusion, it's industry practise. If you have better precision, you stick to it.

lat/lon kinda sucks because of the different notations (degrees with decimals? degrees and minutes with decimals? degrees and minutes and seconds with decimals?) and because of the way 0.1 degrees is a different amount of distance based on where you are on the earth.

UTM is better in every way, and already used on most modern hiking and topo maps.


Looks like regular grid, but they did special irregular 32v cell to cover land part of Norvegia. For me this places the UTM in "not reasonable" land.

I've been working with the AWS Textract, it scans image files and returns the coordinates of the scanned text. The coordinates are percentages of the width/height of the image where one edge is 0.0 and the opposite edge is 1.0. The software returns numbers like "left = 0.3140530586242676", which means that the text's left coordinate is 31% from the edge. This is providing sub-pixel precision for 150 dpi documents.

Users don't usually care about raw lat/lon. If you want to display them as an approximate so you can tell them apart as some kind of logging output do something like "(%+07.4f,%+07.4f)".

What about future GIs programs that can handle these digits, should we not futureproof anything that doesn't exactly utilize it today?

What would future proofing even look like?

For me I think we need to move away from thinking first/primarily in terms of coordinates. Yes, I know that sounds silly but hear me out.

Say someone is going out to meet a friend at a restaurant they haven't been to before, or perhaps they don't exactly recall the directions they took the last time -- how do they find directions? The first thing they most definitely do not do is say "hey maps app, tell me how to get to -37.80467681 144.9659498".

There is a level of indirection we go through via the names of businesses, places, etc. So we ask, "tell me how to get to I Love Istanbul". And we hop through a number of steps mapping that unstructured name to some coordinate.

But what if instead we used structured names instead of unstructured "names"? Then those names could map to a coordinate, or a street address, or well anything. What's more we could attach metadata to the name.

This is what DNS does. Humans know the names. They are structured according to some simple to follow rules. And computers take care of mapping those names to IPs, as well as other metadata.

Why not do exactly the same, but with locations? Re-use the same system but map to a coordinate? Or a region? Or whatever you need for your use case.

If you did this you'd now have an L-NS instead of a D-NS. That's exactly the insight we've had at the startup I'm working at.

[1] https://www.qalocate.com/resources/

It's not about software that can't handle the precision (the software can). It's about the message the false precision conveys. By stating a value out to that many decimal places, they are stating that the location is good to a millimeter. But the location was almost certainly captured by a GPS device that can only locate you down to a 5 meter radius. They've provided a value that implies a point, when they should be providing a value that defines a circle.

Does anyone here know what's the actual precision that we can get by the available equipment?

Survey-grade GPS (using CORS and WAAS correction) using RTK will give you accuracy to ~10 cm within a few seconds. You can get that down to several millimetres if you can tolerate post-processing several hours worth of static data.

It's true. But survey-grade coordinates generally are used in a universal transverse Mercator, or in some locales a Lambert conic or some other locally useful projection. Those projections take into account such things as the deviation of the earth's shape from a perfect sphere.

Rule about lat/long precision: If somebody says the words "datum" or "projection" to you, and you say "huh?", then you only need four digits after the decimal point in your lat/long coordinates. On the other hand, if you're designing an outdoor parking lot and you don't want puddles, please use survey-grade coordinates.

Is accuracy at that level actually possible with GPS? Precision seems likely, but it is substantially more difficult to provide absolute accuracy at that level.

Yes, but not on its own, that I am aware. High accuracy GPS is is based on a combination of GPS satellite data itself with real time or post-processed positional correction using references stations which themselves are placed in a known surveyed location.

In the US, 60 cm accuracy is often possible with nothing more than WAAS corrections. 10 or even 1 cm accuracy is often possible in real time with access to an RTK network.

I'm by no means an expert, but yes, if you can use a modern realisation (e.g. ITRF2014) and have access to VLBI / DORIS / GNSS then sub-centimetre accuracy is possible, though not in real time.

Depends on the equipment. Sub centimeter with the right equipment and appropriate reference station. 3 meters, typically, with consumer equipment.

I know what I can get on a Nexus 6P and it is not a pleasant number. To make things worse the logic calculating the accuracy is also broken (e.g. a GPS point stating 5m accuracy may jump around by tens of meters). tl;dr cheap phone gps are not to be trusted without a lot of vetting

Ugh. It's better to use geohash almost everywhere. It solves this and many other problems (lat/lon ordering), etc..

When you completely run out of things to complain about.

That's pretty much what I was thinking as well. It's funny and an enjoyable read, but at the same time it's a complete non-issue. Unless you have some application that chokes on that many decimal places AND can't do any sort of input sanitation, why worry about it?

I thought it was a measured opinion (no pun intended), and interesting too.

It reminds me of my high school science class, where the teacher complained we were copying too many digits from the calculator. Apparently significant figures were part of the curriculum. But then of course many people ended up rounding too early...

Edit: This is what I get for only skimming the article. Ignore

While obxkcd is usually good for a few upvotes, you're probably losing karma because the article included this link.

The xkcd is a good TL;DR for this article. Even provides more useful examples, less ranting.

As far as I know sig figs aren't a primitive type in most languages so the author's point is moot.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact