Hacker News new | comments | ask | show | jobs | submit login
Australian Geocoded national address data to be made openly available (data.gov.au)
201 points by zspitzer on Dec 7, 2015 | hide | past | web | favorite | 51 comments

This is amazing. I was astonished that even zip code data is proprietary data of the USPS in the United States. It's possible to "reverse engineer" the data from publicly available census data (zip code tabulation areas) which is what I do and publish for free on ZipLocate[0]. Comparable datasets cost $499.

Canada's situation is even worse. Geocoder.ca has been sued by Canada Post[1] to take down their data (which Canada Post was selling for $5000).

The whole address data situation is really terrible. Glad to see Australia opening up the data.

[0] http://ziplocate.us/

[1] http://geocoder.ca/?sued=1

US zip codes are authorized and assigned by the USPS, so by definition they are proprietary. However, the data itself (once it is obtained) is freely redistributable (in raw form).

The barrier is that the official source of zip code data, National Five-Digit ZIP Code and Post Office Directory, is copyrighted[0], and there is work in extracting the data from it.

The nice thing is that US copyright law limits the protection for "facts" [1]. The selection and arrangement of facts can be copyrighted, but not the underlying facts themselves. So it's perfectly legal in the US to take the zip code data and distribute it, just not in the original form from the National Five-Digit ZIP Code and Post Office Directory.

As for other countries than the US, it would depend on their local laws.

[0] - http://pe.usps.com/Archive/HTML/DMMArchive0810/G013.htm

[1] - https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St...

For anyone else wondering why isn't National Five-Digit ZIP Code and Post Office Directory in public domain like every other work of the US government:

>Works by certain independent agencies, corporations and federal subsidiaries may not be considered "government works" and may, therefore, be copyrightable. For instance, material produced by the United States Postal Service are typically subject to normal copyright.


Thank you for running ZipLocate for free, I admire this.

On a similar note, do you (or anyone else) know how to get data about the parcels/property sizes in the US (e.g: I want to check if address X is over 10k square feet)? Are there any open sets from which it would be possible to compile this dataset, or the only way is to go to Maponics etc and pay (but still, where do Maponics got this data from)? I needed this for my last project - at the time I used Zillow API (http://www.zillow.com/howto/api/GetDeepSearchResults.htm) but it was rate limited and had data for only about 50% of addresses I queried.

I don't think PAF (which is Australia's postal data) is included in this announcement.

No one should be using PAF to geocode, anyway.

GNAF includes addresses derived from the PAF. The vast majority ie over 90% are matched to jurisdictional addresses and just increase the confidence value. Where they don't match and there are more than 5 of them they are included as street level addresses. Only jurisdictional addresses have cadastral level geocodes.

Thanks, learning something new every day.

Heh, this'll reduce the slightly-underground trading of recent-ish versions of the Ausyralia Post postcode to lat/long database that pretty much anybody doing ecommerce web dev in Australia has likely engaged in for the last 15 years ;-)

(On a more cynical note: I bet this is only happening because geocoding via GoogleMaps has become "good enough" that nobody is paying the outrageous prices they used to ask for this data...)

You may also want to check out Data Tools' Express Capture - http://expresscapture.datatools.com.au/

That is great, now with some relatively simple infrastructure you will won't have to get ripped off ever again.

This is a fantastic news. As an Australian and someone who has worked on personal projects requiring a geocoded address database, and had to jump through million hoops to come close to it, this is invaluable.

Here's a related thread I've been following for sometime: https://datagovau.ideascale.com/a/dtd/Free-the-G-NAF-Address...

What were your alternatives? Google geocoding gets expensive quickly.

Reverse engineering data in commercial products. Of course this is not a viable solution for anything but private personal stuff.

For anyone in the UK is interested in this type of data:


The data is free, but it's in Easting/Northing format rather than Latitude/Longitude. I would recommend converting it to Latitude/Longitude using the WGS84 projection (if that means nothing to you, basically it means you're using the standard format for coordinate data used in websites, can use the data with Google Maps, OSM, etc...).

On the other hand, grid refs are a bit more human friendly, and can be used much more easily on OS maps than lat/long. They clearly wouldn't work on a global scale, but generations of Scouts would be lost without them!

For a neat visualisation: http://www.npemap.org.uk/postcodeine/

Breasal is a handy tool for doing the conversion.


Thanks for the tip.

I'm planning on doing this conversion soon, I intended to use the following C# library as a SQL Server CLR function, but I probably would've used breasal (or something similar) if I was using Linux:


For anyone who wants to use this C# library, worth reading this:


Another good source of UK Postcode data is doogal.co.uk. The author (Chris Bell) combines the Codepoint Open dataset with other ones from the ONS; that gives a more comprehensive list of postcodes (including ones that are no-longer in use and codes in Northern Ireland and the Isle of Man).

In his datasets he's already done the conversion to regular lat/lon.


Conversion between any 2 arbitary spatial reference systems (e.g. to/from lat/long) is quite easy and trivial with any GIS library. PostGIS can do it with "ST_Transform(ST_SetSRID(ST_Point(X, Y), ORIGINAL_SRS), 4326)"

This is big news, PSMA was basically setup as a for profit government owned business which sold this data at great cost. I'm really glad the government has seen how valuable this data is to the Australian people and business. Great to see the government making genuine improvements to foster innovation in Australia today, really impressive stuff.

This looks very promising!

Hopefully this means the data can be incorporated into projects like OpenStreetMaps, though I have no idea if it is appropriate (I guess it would be) nor if it's already incorporated or not (I would guess not, based on the article).

This should go into OpenAddresses - http://openaddresses.io

OpenAddresses was set up by some OSM developers as a better way of parsing, tracking and combining these kinds of address datasets rather than storing them directly in the OSM database.

There are already a lot of addresses in there in Queensland and Victoria, so it may be some work to combine them.

I'm the guy who imported the Queensland and Victoria addresses, and I've been in touch with members of the government in Canberra who are leading out this effort. The new release should supersede the existing Australian datasources in OpenAddresses.io; my expectation is that we'll deprecate them in favor of this one.

PSMA publish GNAF every three months. Need to think about a production process and metadata :-). There are about 100k "new" addresses each time and a lot of addresses improve their location. Might be worth asking for a change file.

Nice work with OpenAddress.io BTW.

This would be classed as an import for OSM, and can get tricky. One problem is that the OSM community may already have lots of addresses already, and trying to "merge" the two datasets into one can be trickly.

There are sometimes social problems with imports. The long term affect of imports can be negative (just look at USA)

Can you elaborate what social problems you're referring to and how it's affected USA?

Quite early in the OSM project (2007), the free road data from the US Government (aka TIGER) was imported[0]. However the quality of the data was, in retrospect, poor. Even now, the data is of poor quality in the USA, especially compared to other develeped countries.[2]

It also makes the map "look done", and the USA, despite it's population size, doesn't have as active a community of mappers. The USA has about the sized OSM community as the UK or Germany, despite having 5 times the size.

One theory is that since the map for the USA "looked done", that people thought it was done, and hence didn't start mapping and that hindered the growth of the community there.

To see more, zoom in a lot in OSM in the UK or Germany, and compare similar sized towns/cities to the USA. The USA map will usually just have the roads (with names), whereas in the UK or Germany you'll find much more shops, schools, footpaths, carparks, parks etc. mapped.

[1] Details of the import http://wiki.openstreetmap.org/wiki/TIGER [2] A long time OSM mapper points out some of the problems in 2015 of the USA's data http://www.openstreetmap.org/user/Richard/diary/34290 Or this: http://www.openstreetmap.org/user/Richard/diary/26099

Thanks. I'm actually one of those people who was under the impression that USA data was fairly complete.

The OSM data might look complete, but lots of roads (especially those tagged as residential) are mistagged. cycle.travel is a bicycling routing website (etc), and they ignore "residential" roads in USA that haven't been surveyed.[1] You can help improve the map by fixing them up.

The other missing thing is POIs (Points of Interest). Find an area you're familiar with, and add pubs, shops, cafes, vending machines, bus stops etc.

[1] https://lists.openstreetmap.org/pipermail/talk-us/2015-June/...

A great tool for cleaning up the bad TIGER data in OSM is http://osmlab.github.io/to-fix/#/task/tigerdelta

In general the OSM community errs on the side of vetoing new initiatives -- doing an import properly means raising it on the mailing lists, which invariably attracts vastly more criticism than assistance. Even coordinated (non-automated) remote mapping attracts considerable criticism these days -- most ludicrously, by people suggesting that it's better to leave an area unmapped so that it might one day attract local mappers (rather than remote contributors who will work on it immediately).

There's also the question of the license. Without opening a can of worms about the usefulness of sharealike provisions in general, I think it's safe to say that making a geocoding result trigger sharealike implications in a database is clearly problematic (consider geocoding a database of customer addresses, then being obliged to share the rest of the table!). Unfortunately OSM hasn't yet reached agreement on a geocoding guidance. Consequently a couple dozen of us working on OpenAddresses have gotten the project to over 200 million addresses in less than 2 years. OSM is now a decade old, has millions of registered mappers, and contains less than 60 million addresses.

I don't mean to be all doom and gloom, though. I would love to improve OSM as a home for address data. And I urge those of you who care about this incredibly important resource to join me -- hop on the talk and legal-talk lists and help make the case for a geocoding guidance that makes sense.

> In general the OSM community errs on the side of vetoing new initiatives

"New initiatives" are fine. "new bulk data imports" are a different thing. There are many social and technical problems with importing data. De-duplicating data is hard.

OSM, unlike OpenAddresses, wants to have one licence for all the data, rather than lots of little licences for each different region. OSM also (tries) to have one hierachial, address data format for the whole world, rather than a collection of different formats for each region.

> OSM is now a decade old, has millions of registered mappers, and contains less than 60 million addresses.

OSM is more than just addresses.

This might stimulate some progress in the UK which is under a legal stranglehold of the privatised post office address file. The licensing of UK postcodes infects all the derived sources meaning there is no clear path to constructing a freely licensed set of geocoded postcodes. The ODI published a report on describing the details of the obstacles faced: http://theodi.org/case-studies/open-addresses-the-story-to-d...

This is part of the big federal innovation push today. Looks like more open data will be a focus, thankfully.


This is even more interesting! I wonder if this means RMS would have to open their street data (e.g. maps, sign locations, speed limits, turn restrictions, etc).

Seriously, if this is true, as an Australian who has worked with this data in government, and who has gone through the gamut of trying to find such data wrt my own projects, this will be fantastic.

One of the biggest barriers that made me leave out various geographical/address routines in my own work was the legal minefield regarding who owns and can use this kind of data, and that I'd put myself at too much risk even if I did get some details together "legitimately".

Very promising...

so test/test gets you into the web system - unfortunately has a different password for getting the actual data from the SFTP server... Just have to wait until Feburary then I guess (no release date announced, this is the next scheduled release of data), oh well.

PSMA usually release In the last week on the Month. I suspect it will be the 22nd of Feb 2016.

As an Australian and as someone who attempted to obtain G-NAF data for an address validation service this is fantastic news. Address data in general and geocodes in particular are a precious commodity from a licensing point of view. They help CRMs maintain good address data and are also very helpful in driving location based marketing activities.

https://isgnaffreeyet.xyz - I couldn't resist.

> The G-NAF and Administrative Boundaries datasets will be published under an open data licence

Please let this be an actual open data licence, not a creative works licence (i.e. CC-anything but 0) applied to data, which has been the standard practice of the Australian government.

I agree with you that CC0 would be ideal form every point of view, but the 4.0 CC licenses have actually been built to work for databases too.

This is awesome - and long overdue!

Can't believe how expensive this data was.. at some point we discovered that the suburb boundary polygons could be hacked out of the census reports though!

The ABS has been providing shape files of their geographies for quite a while.

While they're not a perfect match for the official administrative boundaries, they're good enough for many use cases.

Amoung other things, they provide polygons for suburbs, LGAs, and postcodes.

FYI these "Non ABS" geographies form part of the Australian Staistical Geography Standard and the 2016 version will be formed by aggregates of Mesh Blocks. The alignment between ABS versions and the official ones will be much much closer.

Where can we get this?

FTA, data.gov.au, in February 2016

Another good reason I am happy this will be released is earlier last year I called three re-sellers of this data and none even bothered calling me back.

Sadly they don't say which license they're going to use. This would be very important for an import to Openstreetmap

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact