A mystery for the ages.
Imagine, for a moment, that someone uses these values to put up a fence. Is the fence on their property? Those crap values beyond the actual significant digits of the measurement may tell you that you are, when you're not. That will cost you time and money down the road.
That's not true nor is it the problem at all. The thesis was that the values, albeit correct, have too many significant digits, which in turn reflect differences which lie somewhere between having no practical use or being absurd.
> Imagine, for a moment, that someone uses these values to put up a fence.
That example is very poor. Any engineer can tell you that a measurement is meaningless without tolerances/margin of error, and the tolerance in effect when putting up a fence is not expressed in a microscopic scale.
But these locations are very precise, indicating high accuracy/low margin of error, but they don't have that high accuracy. In fact, the error could be in meters, rather than in the implied millimeters, so it definitely could be a problem for a fence.
Here is a college textbook chapter detailing types of precision and accuracy errors in GIS systems: https://www.e-education.psu.edu/geog469/book/export/html/252
"GIS users are not always aware of the difficult problems caused by error, inaccuracy, and imprecision. They often fall prey to False Precision and False Accuracy, that is they report their findings to a level of precision or accuracy that is impossible to achieve with their source materials. If locations on a GIS coverage are only measured within a hundred feet of their true position, it makes no sense to report predicted locations in a solution to a tenth of a foot. That is, just because computers can store numeric figures down many decimal places does not mean that all those decimal places are "significant." It is important for GIS solutions to be reported honestly, and only to the level of accuracy and precision they can support."
Or in other words, don't build a fence relying on naked coordinates of high precision but unknown accuracy.
Reporting a value with too many significant digits is the same as reporting an incorrect margin of error, which (arguably) makes the value wrong.
The context is that it's in a database serving up business locations. How is the geographic area owned (or occupied?) by a business reduced to a point? They don't say, so it's clearly unreasonable to expect accuracy and precision beyond, say, property boundaries, and perhaps not even that.
It would be madness to infer that the extra digits denote precision beyond the crude bounds set by context.
It would be further madness to reverse engineer a property boundary -- an entire path of points -- from a single point in a database.
FWIW, the error margin for 6 significant digits is around 11 cm. Accurate enough to point at a computer mouse.
You can easily try and overlay county parcel data into google earth and see how many features don’t align with the sat imagery.
That's why people get concerned about this; that's why posts like this are important.
If you want to argue that Google Maps could do more to highlight the accuracy/precision limitations of its projected imagery, I'd completely agree -- but that's a different argument.
In any serious application you should have known error values and probably be using a more appropriate projection. If you need to accurately locate something you need someone trained to do that with the right equipment.
Online maps aren't respecting this, they're giving decimeter precision values on maps that are lucky if they're accurate to within meters, if not tens of meters. That's the problem.
Are you working with plans that show decimal degrees at a specific precision? I am curious what domain that is.
They're like the physical-world equivalent of a URL; I don't honestly care if there's an unused query string parameter tacked onto the end of it or if it redirects to https or something; all I care about is that the link I clicked on gets me to the correct cat video.
The relationship between trailing non-zero digits in the decimal representation and trailing non-zero bits in the significand is not one-to-one.
One third in a trinary representation would be 0.1. In decimal, 0.33333333333...
Mostly fruitless overprocessing. Only time I'd consider doing this would be for an epic ASCII dump of coordinates.
Even if the values were precise, the fence can never be precisely located. Land/parcel demarcations don't work that way. Also, GPS coordinates change over time. Or rather, the land underneath moves.
Fence or other land locations are always relative (to an official surveyor's point), not absolute.
Rewinding a little:
There's your first problem.
Using a useful tool to do something useless doesn't make the useless thing useful.
Sometimes significant digits are too primitive a tool for error communication, and you might have to use the parameters of a normal distribution, some other distribution, or even a histogram. Other times significant digits are too cumbersome because it would take time to discover the precision in a situation where nobody actually cares beyond a crude threshold. This is one of those other times.
Is it difficult to include precision when reporting measurements? No
Is it sometimes valuable? Yes
Is it really too much to ask for? No
* Give an error bound like 45.73490534578° (±0.00002°) and indicate in prose that this is a 2σ bound.
* Put non-significant figures in parenthesis, like 45.73490(534578)° (EDIT: possibly I've misinterpreted this one when I've seen it, see logfromblammo's reply)
* Put a bar over the last significant figure, like 45.73490̄534578 (hopefully this one renders properly when I post this... (EDIT: nope))
If I say the atomic weight of F is 18.998403163(6) g/mol...
mean 18.998403163 g/mol
std.dev. 0.000000006 g/mol
mean 5.391245e−44 s
std.dev. 0.000060e-44 s
Significant figures rules give you a close-enough propagation of error, but in order to be more exact, you need to combine absolute uncertainties when adding or subtracting, and combine relative uncertainties when multiplying or dividing.
An intuitive way to understand this is considering how much work is necessary to extract information from the next digit. You can use statistic to extract more information. As a rule of thumb you need 100 times more measurements for each additional digit. So you need something like 100 (or 400) for the first digit, 10000 (or 40000) for the second digit and so on. (There is a constant here, I never remember the constant, perhaps it's 4, perhaps it's 1.)
To extract some information from the last 8 in 45.73490534578° you need 1000000000000 measurements! So it's better to just ignore the tiny amount of information in the 8.
In the lab in very controlled scenarios you can repeat a measurement very carefully a lot of times automatically and then use statistics. But if you have a handheld GPS, you can't repeat the measurement more than a few hundred of times.
A device that use a somewhat similar process is the https://en.wikipedia.org/wiki/Lock-in_amplifier it is not exactly this statistic trick, but note that it needs an stable environment to repeat the measurements. Wikipedia says that it can detect a signal 1 million times smaller than noise, but IIRC (the cheap ones?) can only detect a signal that is 1/100 or 1/1000 of the noise..
PS: Please never use 45.73490534578° (±0.00002°). If you use in a lab in the university, the TA will get mad at you. You can try using 45.73490534578° (±0.00002000000°) that is bad but not so horrible bad, the TA will still get mad at you but you may survive.
I.e. if you measure carbon from the upper atmosphere, it's going to have more C-14 in it, from cosmic rays flipping protons in N-14 to neutrons. And if you measure carbon buried for thousands of years, it's going to have less C-14, from natural decay.
If you look at https://en.wikipedia.org/wiki/List_of_physical_constants you can see that the parentheses are omitted from defined constants, and included for measured constants.
In much of the world, phone numbers work even if you add extra numbers that aren't needed. So I could give you my phone number plus 10 extra digits that change nothing. The end result would be the same utility to you (you can contact me) but with an increased cost in recording, memorizing, chance of error.
Using lat/lon to an unnecessary level of detail is the same thing. More digits are more chances to make mistakes, more cognitive load.
That's not rounding, so I'll assume you meant "slightly more than half the time, 45.73490534578° is still going to be closer to the true value than 45.7349053458° is".
And that may be true, but if 50.0001% of the time one is closer to the true value than the other that's essentially meaningless.
This is a big IF which you can't actually know because its beyond your level of precision, by definition.
I am working on a Twitter bot called @sfships, which monitors the comings and goings of large ships in the San Francisco Bay: https://twitter.com/sfships
As part of that, I generated the above map from AIS data.  It's basically where ships stop. If you zoom in on the San Francisco waterfront, you will see a grid of dots. That's because the AIS protocol stores lat/lon as minutes/10000 , throwing away more detailed information.
This is adequate for its initial intended purpose, which is putting ships on radaresque displays so that ships don't hit one another, etc. But it produces all sorts of artifacts and issues when one tries to use the data more broadly.
And in case anybody is interested in playing with this, I have written a python parser for AIS data with a bunch of Unixy command line tools, including one that just turns the weird 90s protocol into more modern JSON: https://github.com/wpietri/simpleais
This is especially true when you're dealing with potential bad actors who can spoof their transmissions, combined with incredibly poor vessel metadata databases and strange satellite AIS coverage issues.
If there are any data scientists out there looking for a real head-basher, start an AIS project.
Other common reasons include illegal fishing in protected waters, covert meetings (e.g. a couple yachts coming together in the middle of the ocean), spy ships by state actors trying to look like ordinary commercial traffic, and misrepresenting state of shipping assets for commercial leverage in contract negotiations.
And yet, some of those dots are significantly further inland than rounding can account for. There's one faint point near 4th and King, for example, and a strong one on Spear between Howard and Mission. All signs point to the source data being imprecise enough that transmitting it at a higher precision wouldn't improve the results.
At work, I recently saw a survey result from a small customer base reported as "31.3% of respondents" rather than the more communicative "5 out of 16..."
For that matter, my wife taught a class one year at a local university. The class had 29 students, or at least 29 who bothered to fill out the evaluation, and the evaluation had a couple of places right of the decimal point. I found myself going reckoning, "hmm, 27 of 29 though that ...".
(Or is that Centimetre? Probably a metric/Imperial thing... )
> Schools ranked no. 220 or above are in the top 1 percent of America’s 22,000 high schools, no. 440 or above are in the top 2 percent and so on.
But he only has ~2,000 schools in his database, or less than 10% of all schools.
I'm sure my thermostat internal temperature representation is way more precise than the 0.5 degrees precision it's showing on the display.
"At 40 degrees north or south, the distance between a degree of longitude is 53 miles (85 kilometers)."
So while it's appropriate for Melbourne (37.8S) to use (very slightly) lower precision than Sydney (33.9S), they're both using too many digits.
Its almost like if I was to record where an astronomical object was using only time. Assuming you have full knowledge of its past trajectory, depending on how fast that object moves, you would need different precision times for objects traveling at different velocities, to get the same precision in space.
This is why you just give up on the problem and report to a precision that always works, instead of using the standard scientific procedure of significant figures. I guess if GPS measurement devices took all this into account they could report their measurements with the correct number of digits (taking into account measurement error, and the absolute position on Earth). I guess 15 digits is always too many, but whatever.
It is described in the linked wikipedia article, though: https://en.wikipedia.org/wiki/Decimal_degrees, so it isn't completely ignored by the author.
Geojson format advises it also
If you actually read the RFC, it's giving a common example as an artifact of sprintf, and explicitly stating that implementations should consider how much precision they need...
Here are more details: https://www.google.com/amp/s/theconversation.com/amp/austral...
And on their official web site: http://www.icsm.gov.au/datum/gda2020-fact-sheets
Many of the underlying fixed point systems are designed for meter precision, with some newer systems going down to a centimeter. As a practical matter, repeatable positioning on the Earth's surface becomes difficult below 10 centimeters of precision, so centimeter precision is widely viewed as the physics floor.
For surveys that require maximum repeatable precision, they will often use the mean position as measured over e.g. 12 hours. The magnitude of the variance varies quite a bit depending on where you are on Earth.
There are both short and long-term motions of the earth. All are sufficiently measured and modeled such that repeatable measurements are very much achievable well below 1 cm.
In fact, it is GPS that is used to develop the models of earth movement. So I don't quite get what you are saying.
I agree that parts of the world have repeatable measurements below 1cm if you can account for the myriad effects that cause displacement, but others do not. The models are also approximate, since some effects require real-time measurement of things we do not have real-time measurements for, or which are not practically available in contexts where local position needs to be determined.
Only if the precision is meaningful. The author's point is that it isn't for geographical coordinates.
Moreover, given the author's point that real measurement errors exceed the false precision of published data, if such a calculation were performed and did provide "arbitrarily large" error, it would indicate that the result should in fact be nonsense.
>>> import numpy as np
>>> x0 = np.random.rand(1)
>>> x_rnd = np.round(100*x)/100.0
>>> for i in range(1000):
... x = np.mod(x+np.pi,1)
... x_rnd = np.mod(x_rnd + np.pi,1)
... x_rnd = np.round(100*x)/100
>>> x_real = np.mod(x0+1000*np.pi,1)
(array([0.51013166]), array([0.51013166]), array([0.51]))
That's why I ask about the kind of algorithms typically applied to geolocated data. Off the top of my head, I can't think of anything that would be both useful and error-amplifying.
One of the things they don't mention is that the rounding step does not work as they expect for single-precision floats. You have to use double-precision floats to get the same results that they are demonstrating in the example.
They are asking for 5 values to the right of decimal points, which, with the maximum of 3 digits to the left of the decimal point, means a total of 8 significant digits. Single-precision floats should be good to 9 digits, but the rounding step is off by one when using singles.
The solution was to cast singles to doubles before performing the rounding. Which seems absurd.
Single-precision floats should be enough precision for lat/lngs anywhere on the Earth, for everything other than some applications in commercial-grade surveying. And if that is the job you're doing, you shouldn't be using Google Maps.
There are a number of different issues here, which I think are only partly explored through the article.
Let's take as given that you need to direct a person some place. In the article, they are directing someone to a restaurant. But this gets complicate fast. Is this person a patron rather than, say, a deliver driver there to pick up an order, or a plumber there to fix and appliance, or an inspector there to observe the kitchens, or ...?
By using a lat/long, or any geo-coordinate, we lose the human-value of context. Each of the folks I list above has a very different place they potentially need to navigate to. And even if their destination is the same, the routes they took may NOT have been. Where they park, or were dropped by ride share, and which doors they use are also influenced by their role.
Using a geo-coordinate to drops the rich meaning that humans in all their roles require. A better solution is to use a real identity and then, when and where coordinates are needed, derive them based on the persons role, whether it be patron, plumber, or paramedic. As roles change, or as the structure itself changes, then directions change to meet them. I find this a much richer solution than just blindly telling my maps app to direct me to some GPS location.
Another issue hinted at but not deeply explored is that often what we want to specify is a region or volume, not a coordinate. Things in the real world consume volume. Coordinates are idealized points and do not. This may seem a trite observation, yet, the majority of our tools think in terms of points. With the rise of autonomous vehicles, especially drone delivery, we need to move towards representations of regions and/or volumes, depending on needs. I'm not convinced that geopolys are quite right for this task and instead we need something that is comfortable with the fuzzy and complicated boundaries humans have to deal with.
Turns out it's about 11 cm.
A millionth of a degree is about 1.75E-8 radians. For small degree values, sin(x) = x, so we can just multiply that figure directly by the radius: ~ 6400000 m * 1.75E-8 = 0.112 m.
1. It implies more precision, which in many cases is a flat out lie. Savvy users will know to truncate, round or something, but others will be lead astray.
2. It takes more space to represent. If the engineers storing double precious floating points had asked themselves if they needed to be sending these out at such a high resolution, they may have been able to cut the storage of these things down 50%.
Though yes, there are probably bigger issues to worry about. Doesn't mean we should outright dismiss these points.
You can use digits to indicate accuracy but if you don’t know whether the measurement stored as a double is accurate to 2, 4 or 12 decimals then you can’t really do much in presentation. The reader has to interpret the number. Usually the reader doesn’t care much about accuracy, and simply copies the numbers to another system. The coordinates with full double precision is then basically a machine readable format.
Perhaps the Google Earth image is "wrong". Stitching aerial and satellite images together and aligning every pixel to the exact geo-location isn't as simple as it sounds -- especially given that the ground shifts over time.
CAD/GIS system round to the default vsprint %f precision which is 6. They could use a zoom dependent number but nobody would do that, they would laugh at you. It's not a delusion, it's industry practise. If you have better precision, you stick to it.
UTM is better in every way, and already used on most modern hiking and topo maps.
For me I think we need to move away from thinking first/primarily in terms of coordinates. Yes, I know that sounds silly but hear me out.
Say someone is going out to meet a friend at a restaurant they haven't been to before, or perhaps they don't exactly recall the directions they took the last time -- how do they find directions? The first thing they most definitely do not do is say "hey maps app, tell me how to get to -37.80467681 144.9659498".
There is a level of indirection we go through via the names of businesses, places, etc. So we ask, "tell me how to get to I Love Istanbul". And we hop through a number of steps mapping that unstructured name to some coordinate.
But what if instead we used structured names instead of unstructured "names"? Then those names could map to a coordinate, or a street address, or well anything. What's more we could attach metadata to the name.
This is what DNS does. Humans know the names. They are structured according to some simple to follow rules. And computers take care of mapping those names to IPs, as well as other metadata.
Why not do exactly the same, but with locations? Re-use the same system but map to a coordinate? Or a region? Or whatever you need for your use case.
If you did this you'd now have an L-NS instead of a D-NS. That's exactly the insight we've had at the startup I'm working at.
Rule about lat/long precision: If somebody says the words "datum" or "projection" to you, and you say "huh?",
then you only need four digits after the decimal point in your lat/long coordinates. On the other hand, if you're designing an outdoor parking lot and you don't want puddles, please use survey-grade coordinates.
In the US, 60 cm accuracy is often possible with nothing more than WAAS corrections. 10 or even 1 cm accuracy is often possible in real time with access to an RTK network.