

Google Patents: Missing Data Before 2000? - nbpoole
http://www.alexmbell.com/whats-on-google-patents/

======
pwg
From the blog: > It’s unclear what sampling of patent applications they’re
actually providing; I wish they were more transparent about what data they’re
providing.

Did you download the data from here?:
<http://www.google.com/googlebooks/uspto-patents.html>

If so, how much clearer do you need it to be?:

> Patent Application Publication Full Text (2001 – present)

It is applications that were published from 2001 to present. Prior to 2001
applications were not published, that is why it begins in 2001.

~~~
nbpoole
The blog post links to <http://www.cs.brown.edu/~ambell/data.html>, which in
turn links to <http://www.google.com/googlebooks/uspto-patents-pair.html> as
the source of the data. That page doesn't give a particular date range.

The dataset also does have applications from prior to 2000, it just has
(relatively) fewer.

~~~
pwg
No, not a date range. But it does say this:

>No guarantees are made with respect to the completeness or accuracy of this
data.

And this:

>As of 2012-05-26, we have data for 1946194 patent applications, including
most of the published applications in the following ranges:

... long list of application serial numbers omitted ...

The serial numbers in the list directly map to dates, as the serial numbers
are sequentially assigned as the apps get filed. They just don't provide the
serial number to date mapping data for whatever reason.

Further, it is data retrieved by a crawler.... Note the first sentence:

>Google has begun crawling patent documents, including image file wrappers,
from the USPTO's public PAIR (Patent Application Information Retrieval) site.

So it has, unsurprisingly, only that which it has crawled and retrieved. It is
not comprehensive, and it makes no assertions that it is comprehensive.

