
Geoparsing, Batch geocoding and geocoding Europe - eruci
http://geocode.xyz
======
eruci
Geoparsing means there need not be a specific format for the data to be in,
for geocoding to work.

For instance On maps.google Department of Food Safety and Zoonoses (FOS) World
Health Organization Avenue Appia 20 Geneva Doen't work

On Geocode.XYZ it works:
[http://geocode.xyz/Department%20of%20Food%20Safety%20and%20Z...](http://geocode.xyz/Department%20of%20Food%20Safety%20and%20Zoonoses%20\(FOS\)%20World%20Health%20Organization%20Avenue%20Appia%2020%20Geneva%20CH)

46.23384,6.13204 » Geneva, CH » 20 Avenue Appia, Geneva, CH » \+ \- Maps ©
Thunderforest, Data © OpenStreetMap contributors 46.23384,6.13204

~~~
dalke
It's a challenging project. Here are some problems I spotted in a few of the
addresses I tried out.

It doesn't understand diacritics correctly.

With the address "Första Långgatan 10, Göteborg" it shows the name with messed
up characters and capitalization, as in "Example: F�rsta L�nggatan 10,
GöTeborg, SE".

The locations also appear to be off by a few tens of meters. "The Old School |
Temple Road | Oxford OX4 2EP" gives an address which is across the street from
where it should be (which is where Google says it is). "Avenyen, Göteborg" is
also shifted. And "Storgatan 1, Stockholm" is in the middle of the street
named "Storgatan", and not at the address, which is at the end of the stret.

Less acceptable, the Louvre, at "Musée du Louvre, 75001 Paris, France", is off
by more than two blocks.

Oh, I tried "Aveyn, Göteborg", which is a typo. It pointed me to somewhere in
the middle of a block, with no street named "Avenyn", or anything like that,
close by. I don't know how I ended up there.

~~~
eruci
Första Långgatan 10, Göteborg is mainly a utf8 encoding problem. (I'm working
on it, it's challenge especially with Russian, Hebrew or Armenian). Should
work with ascii values too: eg,
[http://geocode.xyz/Forsta%20Langgatan%2010,%20Goteborg](http://geocode.xyz/Forsta%20Langgatan%2010,%20Goteborg)
)

As to the typo, I have not implemented fuzzy search yet.

~~~
dalke
My apologies, I see the ambiguity now in what I wrote.

Regarding the typo, I expected to get a "not found" answer, as "not found" is
a better answer than being pointed to somewhere meaningless. I did not expect
typo correction, though that would certainly be nice for some cases.

I see now that the search gives the same location as "Göteborg", so may be the
default locations of the city. If so, that's odd because it's not near the
center of the city.

Nor is it the default address for "Göteborg", as "dalkegatan, Göteborg" goes
to some place near Göteborgsvägen on the east side of town. For that matter,
"dalkegatan Göteborg" (without the comma) goes to some place on the west side
of town.

Given these variations, it's hard for me to trust when an answer is correct,
other than by double-checking the results manually.

~~~
eruci
In this case it is a city match:
[http://geocode.xyz/Aveyn,%20G%C3%B6teborg?geoit=xml](http://geocode.xyz/Aveyn,%20G%C3%B6teborg?geoit=xml)
<geodata> <latt>57.67100</latt> <longt>11.97980</longt> <standard> <addresst/>
<prov>SE</prov> <city>GöTeborg</city> <postal/> <confidence>0.70</confidence>
</standard> </geodata>

The geoparser also looks for city names, and by checking the response you can
see that only the city name was matched.

Having said that, it is also true that we have less data coverage for Sweden
(compared to some other countries that provide open address data). Our
responses will be about as good as the data we have, and openstreetmap does
not have 100% coverage in Sweden either.

~~~
dalke
How is it that "dalkegatan, Göteborg" and "dalkegatan Göteborg" give very
different locations, neither of which is the city match location?

~~~
eruci
Thanks for pointing this out. It was a bug which is now fixed.

