Getty's blog post about it: http://blogs.getty.edu/iris/open-content-an-idea-whose-time-...
An example page where you can download a 4264px × 3282px copy of Van Gogh's 'Irises' (28mb .jpg): http://search.getty.edu/museum/records/musobject?objectid=94...
I'm wondering how long it will be before someone creates a .torrent with the entire collection...
EDIT: Scraper running. Should be done shortly (collecting small version, original version, XML data).
Not going to post a torrent though - mostly cuz you all can figure it out anyhow. The images are all of the form:
NNNNNN01.JPG on that S3 bucket.
NNNNNN is the object ID (the internal record ID of the work of art) zero-padded to 6 digits. The object ID is the same value that appears, for instance, in the URL on their online collection:
URL for Irises: http://www.getty.edu/art/gettyguide/artObjectDetails?artobj=...
URL for the full-rez image on S3: http://gettylargeimages.s3.amazonaws.com/00094701.jpg
You figure it out :)
I said my original comment because the idea was to have use learn the power of wget by redownloading all files, which seemed pointless to me.
2) Iterate through objectids to fetch the following:
a) xml data
b) thumbnails (from getty servers)
c) original work on cloudfront
You're assuming the convention holds across all ~4600 objects. Never assume. Follow the links.
That would make a convenient fizzbuzz style interview question.
There's also the more focused, but less functional, Getty Museum online collection, which has (more or less) the same museum objects, and also has links to download the images where available.
Note, this is ~5000 images out of a collection of hundreds of thousands of objects. This is the easy stuff - public domain objects, with no rights/contract issues, with available, high-quality imaging, fully owned by Getty.
This tells me there's something kind of broken about this rep system, and that it is easily gamed.
Are low effort pot shots really the most valued comments on HN?
A photograph of a painting, meant to re-present that painting with as much accuracy as possible, I think is under the public domain for public domain works.
FYI, this is Copernicus https://en.wikipedia.org/wiki/Copernicus_(lunar_crater)
Observation... Copernicus is 93 km in diameter. The crater in the old print is approximately 9.3 cm in diameter. So in 1850, that astrophotographer printed a detailed 1:1,000,000 scale lunar map of that region.
Today, we can download a 1:1,000,000 scale lunar map of that region by printing the following doc at 30"x30" page size: http://planetarynames.wr.usgs.gov/images/Lunar/lac_58_wac.pd...
Then, for apples-to-apples, you can trim Copernicus out of the modern print in a 13 x 16.5 cm rectangle.
Voila, a 160-year A-B of knowledge and techniques. :)
(Someone fix my math if it's off.)
With a big enough computer the mapping data could be imported into Minecraft.
Sticking to one MC block = 1 metre would be tricky with 93 km. And there might be hight limit problems too.
Are there copyright restrictions for the Getty's open content images?
No. The first release includes 4,600 images of works of art believed to be in the public domain—in other words, works not protected by copyright under U.S. law. The Getty does not claim copyright in digital images of public domain artworks.
Ever since Bridgeman (http://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel_...) most museums have accepted that no matter how much effort goes into the imaging, some imaging simply does not qualify as novel enough to warrant new copyright. This ruling is, as I understand it, not actually binding in other districts, but it's generally accepted.
Because imaging of flat art, without a frame, is inherently intended to add as little interpretation as possible, it's generally accepted that reproduction imaging of 2D art is not copyrightable. This is the Getty's own position (I worked there until a few months ago, from the same department that brought this project out).
Any other questions I can answer about this project I will.
Disclosure: I work at the Metropolitan Museum of Art, and am trying my hardest to share all our data with the world, but have learned that what is legal and what is museum policy can and often do clash. Its super interesting.
If you want relatively low resolution images of our collection, feel free to visit http://scrAPI.org , which just scrapes our live site. Its a workaround until our policy changes.
Roger, how was working for Getty while you were there? Was this your main project? How supportive where those involved in sharing the data? If hackernews isn't the most appropriate place to answer these questions, I'd be glad if you could take the time to go off-site for a conversation.
Check the sample pages. This is a (very good) reproduction of the original, you can read about the history of it online/in Amazon reviews.
Many other institutions have open APIs for their collection data, and often for images as well. See:
Cooper-Hewitt (data released on github)
Brooklyn Museum (API)
Powerhouse Museum (API)
Indianapolis Museum of Art (also data on github)
Europeana (a EU-wide consortium)
DPLA (similar to Europeana, but in the US)
Also, lots of other institutions have open access to their images - many much more comprehensive than The Getty (not to sound bitter, but Getty is getting a lot of press for this where others have lead for years). It's really an imperative at this point - and for most institutions more a matter of when (since few have substantial technical staff) than if. The Getty is an exception, having a large, well-funded staff, but tends to be conservative in policy changes.
For other examples of cultural institutions sharing images widely:
National Gallery (high rez)
Los Angeles County Museum of Art
All this just off the top of my head -
For example, the National Gallery has high-resolution scans, but they aren't downloadable (perhaps you can scrape them, but that isn't accessible to most people). LACMA has only low-resolution images.
Off the rest of your list, Rijksmuseum and the Brooklyn Museum have only low-resolution images, Cooper-Hewitt and Europeana have almost nothing (only very low-resolution). I haven't checked everything.
Maybe you are right that it is becoming more common, but I can't see that myself. The Met is the only example I know of that compares. The Met does have decent resolution downloads available, but the Getty's downloadable scans are still much better.
And many of the others are similarly sized, almost all I've looked at under 1000px in either direction.
My question is: why now? What took so long?