If you need a library to do this, check out GeographicLib. It's written by C. F. F. Karney the same guy who developed an algorithm and wrote a scientific paper about improving Vincenty's formulae for performance and fixing issues at nearly antipodal points.
It has good implementations in C++, C# and Javascript.
Minor point but I know c# has a GeoCoordinate class into the framework Dlls. It also contains methods to calculate distance between points. Many .net Devs I've worked with didn't know about it. I think it came out first in .net 4.0.
I just derived the spherical law of cosines that's the foundation of this formula using quaternions and Sympy. The quaternions were modelled as 4x4 matrices.
The philosophy is that you can treat trigonometry as a triangle construction problem. You give me the specs of a triangle, and I can use physical tools like compasses, protractors and rulers to construct your triangle. This is similar to using trigonometry, except without any algebra involved, just construction tools. You can also use this philosophy to derive the algebraic formulas. When it came to formula derivation, I used the quaternions to simulate the physical construction, and I got out the formulas.
I wish trigonometry in secondary school had been taught more like this (geometrically) because it would have been good preparation for vector calculus, among other things.
There are a lot of interesting edge cases regarding routes that cross the dateline, the North Pole, etc. Most of the code was taken from Movable Type's incredible library of lat/long code:
I learned the hard way the unit of measurement you use for R when calculating distance between two lat,long coordinates is what you get out. So if R is in meters, you aren't calculating the distance in miles. Hehe...should have paid attention in math class (to be fair to myself, American math education is horrible).
I had to implement the Haversine distance for a data mining home work as well. We were given a list of the worlds capitals and we had to use openstreetmap api to get the latitudes and longitudes.
Then we had to implement hierarchical agglomerative clustering by hand, only using the dendrogram function from scikit-learn.
When you're evaluating datastructures to quickly get the nearest neighbor in constant time, and you're dealing with actual longitude and latitude, you really need to be aware of this so that you don't accidentally pick something that only works for euclidean distances.
If you just want nearest neighbor, I would recommend storing floating point (x, y, z) coordinates, and using Euclidean distance in 3-space. (Or rather, the square of distance Δx² + Δy² + Δz²; skip the square root.)
If you want to save storage space / bandwidth, you can use the stereographic projection onto a plane and then cut the precision of the floating-point numbers. Stereographic projection / inverse stereographic projection only requires 1 division per point.
Latitude/longitude is quite an inefficient and cumbersome way of representing points on a sphere, though it’s convenient if navigating using the stars.
I think this is misleading or misstated. Haversine distance of points on a sphere is a monotonic function of euclidean distance, so one's nearest neighbor is the same in both. You must be warning people not to just take the sum of square differences on the actual latitude and longitude values.
I use it in my Captain's Log[0] application for Elite: Dangerous players, to show the distance your ship is to a target Lat/Lon location, amongst other handy navigation features.
I remember googling for this formula for doing a data mining homework in which I was given the start and end points of a taxi in geo-spatial coordinates.
Unless you're taking a taxi across the continental US, there is very little difference between Haversine and Euclidean distance calculations. Haversine is more computationally expensive. Haversine isn't even the actual distance travelled on gridded roads. Haversine is probably not the answer to what you are looking for.
If you do this, you still need to do a one-time correction for the length of a degree longitude at a given latitude, or you get weird behavior even at small distances (and it gets weirder the closer you are to a pole). We learned this the hard way when our "k-nearest places" query was returning point clouds which looked like ovals!
Also, the straight line distance between two points can often be misleading when used to estimate driving distances depending on the topography of the area. It just depends on how important accurate driving distance is for the use case.
Also, another reference for both methods with calculators! Vicenty: http://www.movable-type.co.uk/scripts/latlong-vincenty.html Haversine: http://www.movable-type.co.uk/scripts/latlong.html