Hacker News new | past | comments | ask | show | jobs | submit login
Getty Just Made 4,600 Images Public Domain (smithsonianmag.com)
198 points by sharmanaetor on Aug 16, 2013 | hide | past | web | favorite | 65 comments

Here's a link to the actual collection: http://search.getty.edu/gateway/search?q=&cat=highlight&f=%2...

Getty's blog post about it: http://blogs.getty.edu/iris/open-content-an-idea-whose-time-...

An example page where you can download a 4264px × 3282px copy of Van Gogh's 'Irises' (28mb .jpg): http://search.getty.edu/museum/records/musobject?objectid=94...

I'm wondering how long it will be before someone creates a .torrent with the entire collection...

I won't lie... I came here hoping for a link to the torrent of the entire collection...

Already working on it (putting together a package for the Internet Archive; might as well put a torrent up as well).

EDIT: Scraper running. Should be done shortly (collecting small version, original version, XML data).

Update: Still scraping, lots of sleep X to be gentle on Getty.

Do you mind sharing what tool you use to scrape?

I was rushing, so it was just curl, lynx, sed, awk, all wrapped with bash. Don't judge; I'd use python and a scraping library if I wasn't rushing.

Thank you for taking the initiative.

Please post the torrent link once it's done.

Putting the torrent up now. It's ~100GB of content, so give it a bit.

Yay! Link yet?

Doesn't look like it! Will be excited to see this though.

I will.

I'm not going to give up hope. I still believe.

Was at Burning Man, this totally fell off my plate. I'm looking for someplace to put the 100GB zip file up right now.

Amazon S3, requester pays.

Internet Archive?

They had the S3 bucket originally set to allow listing - I grabbed it all before I reported it to an engineer there.

Not going to post a torrent though - mostly cuz you all can figure it out anyhow. The images are all of the form:

NNNNNN01.JPG on that S3 bucket.

NNNNNN is the object ID (the internal record ID of the work of art) zero-padded to 6 digits. The object ID is the same value that appears, for instance, in the URL on their online collection:

URL for Irises: http://www.getty.edu/art/gettyguide/artObjectDetails?artobj=...

URL for the full-rez image on S3: http://gettylargeimages.s3.amazonaws.com/00094701.jpg

You figure it out :)

I dont understand... wouldnt offering a torrent be more friendly than us all downloading it again?

Thats pretty sweet actually, definitely +1.

I said my original comment because the idea was to have use learn the power of wget by redownloading all files, which seemed pointless to me.

This is exactly how I plan to serve up the compiled archive.

1) Iterate through all the search results pages (1-94) to get objectids.

2) Iterate through objectids to fetch the following: a) xml data b) thumbnails (from getty servers) c) original work on cloudfront

You're assuming the convention holds across all ~4600 objects. Never assume. Follow the links.


That would make a convenient fizzbuzz style interview question.

search.getty.edu is a Solr-based search index of resources across the Getty, including library catalogs and much beyond the Getty Museum's own collection - just FYI. You can filter for the collection, as well as for objects with images (and objects with open access images).

There's also the more focused, but less functional, Getty Museum online collection, which has (more or less) the same museum objects, and also has links to download the images where available.

Note, this is ~5000 images out of a collection of hundreds of thousands of objects. This is the easy stuff - public domain objects, with no rights/contract issues, with available, high-quality imaging, fully owned by Getty.

Might want to qualify that as Getty Museum. I thought the title referred to Getty Images, the stock photo site.

Same here. I can't imagine Getty Images releasing anything other than a cease and desist letter.

Thought I should point out that my post here above earned me the most rep points of any of my comments (47 points, where normally I only get 4-5 points on a good day). This is surprising given that it is not particularly thoughtful or insightful, it's just a low effort criticism.

This tells me there's something kind of broken about this rep system, and that it is easily gamed.

Are low effort pot shots really the most valued comments on HN?

They are not the same org?

No. Not sure they were even the same member of the family.

Getty images was started by Mark Getty, grandson of J. Paul Getty who started the art museum. So, not the same member of the family.

This is only news because of the absurd US copyright system.


According to a comment on the page photos of old paintings or photos (whose copyrights have already expired) are considered public domain in the US.

My understanding is hazy, but is it possible that the particular photos taken of objects that are three dimensional could not be under public domain as there was artistic decisions made in capturing the object?

A photograph of a painting, meant to re-present that painting with as much accuracy as possible, I think is under the public domain for public domain works.

Wow, from a quick browse, I think the moon crater image from the 1850s is what I find most technically impressive because it intersects telescope optics, negative chemistry, and printing chemistry. A 3683x2920 pixel image can be downloaded. http://search.getty.edu/museum/records/musobject?objectid=44...

Considering the technology available at the time, the detail is exquisite.

FYI, this is Copernicus https://en.wikipedia.org/wiki/Copernicus_(lunar_crater)

Thanks for the ID on the crater.

Observation... Copernicus is 93 km in diameter. The crater in the old print is approximately 9.3 cm in diameter. So in 1850, that astrophotographer printed a detailed 1:1,000,000 scale lunar map of that region.

Today, we can download a 1:1,000,000 scale lunar map of that region by printing the following doc at 30"x30" page size: http://planetarynames.wr.usgs.gov/images/Lunar/lac_58_wac.pd...

Then, for apples-to-apples, you can trim Copernicus out of the modern print in a 13 x 16.5 cm rectangle.

Voila, a 160-year A-B of knowledge and techniques. :)

(Someone fix my math if it's off.)

> Copernicus is 93 km in diameter.

With a big enough computer the mapping data could be imported into Minecraft.


Sticking to one MC block = 1 metre would be tricky with 93 km. And there might be hight limit problems too.

No idea who might've done this? It's absolutely incredible.

The name at on the image is this bloke: http://en.wikipedia.org/wiki/John_Herschel

I found the one simply entitled 'after' amusing....http://www.getty.edu/art/collections/images/enlarge/00076101...

They've not made the images Public Domain. It is part of an "Open Content Program" and usage must be credited.


No, you are incorrect, quoting from that page:

Are there copyright restrictions for the Getty's open content images?

No. The first release includes 4,600 images of works of art believed to be in the public domain—in other words, works not protected by copyright under U.S. law. The Getty does not claim copyright in digital images of public domain artworks.

Hmm, is the "request" for attribution mandatory? To me it seems that attribution might be optional. As "The Getty does not claim copyright in digital images of public domain artworks." (your link), I would assume that crediting Getty is a nice thing to do but optional.

Many of them look like they are out of copyright anyway, although not all.

The original might be out of copyright but any whoever scanned/photographed them has a new copyright. Pretty silly.

Not (necessarily) true, though a little unsettled.

Ever since Bridgeman (http://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel_...) most museums have accepted that no matter how much effort goes into the imaging, some imaging simply does not qualify as novel enough to warrant new copyright. This ruling is, as I understand it, not actually binding in other districts, but it's generally accepted.

Because imaging of flat art, without a frame, is inherently intended to add as little interpretation as possible, it's generally accepted that reproduction imaging of 2D art is not copyrightable. This is the Getty's own position (I worked there until a few months ago, from the same department that brought this project out).

Any other questions I can answer about this project I will.

Yeah, and it gets even weirder when you start learning about representations of 3d art. Photos of non-flat art do renew that copyright right? But what precedent exists re: 3D scans of objects in a collection? We share what we can, which right now is limited to 123D catch generated models, up on http://thingiverse.com/met .

Disclosure: I work at the Metropolitan Museum of Art, and am trying my hardest to share all our data with the world, but have learned that what is legal and what is museum policy can and often do clash. Its super interesting.

If you want relatively low resolution images of our collection, feel free to visit http://scrAPI.org , which just scrapes our live site. Its a workaround until our policy changes.

Roger, how was working for Getty while you were there? Was this your main project? How supportive where those involved in sharing the data? If hackernews isn't the most appropriate place to answer these questions, I'd be glad if you could take the time to go off-site for a conversation.

No they don't. See Bridgeman Art Library v. Corel Corp. - scans/photographs that primarily seek to reproduce the original work don't qualify for copyright.

And this is the real reason why museums don't allow photography.

Those manuscript pages (like http://search.getty.edu/museum/records/musobject?objectid=30... and http://search.getty.edu/museum/records/musobject?objectid=48... and http://search.getty.edu/museum/records/musobject?objectid=59...) are some of the most beautiful pages I've ever seen. I wish I could read those letters, and old French.

This is a stretch, but this might amuse you (or anyone looking at this thread): http://www.amazon.com/Six-Books-Euclid-Werner-Oechslin/dp/38... (not an affiliate link) Some more information: http://www.taschen.com/pages/en/catalogue/classics/all/06724...

Check the sample pages. This is a (very good) reproduction of the original, you can read about the history of it online/in Amazon reviews.

Here's a book that a martial arts instructor I occasionally train with translated. I combined the translation with the original images on the condition it then be CC licensed:


Tried to find a API to access these images but can't seem to find any. Anyone know if there is any API?

No API - I worked on one last year, but twas never released. Hoping to see something within the year - not just for images, but also collection data (object records, for instance).

Many other institutions have open APIs for their collection data, and often for images as well. See:

Cooper-Hewitt (data released on github) Brooklyn Museum (API) Powerhouse Museum (API) Rijksmuseum Indianapolis Museum of Art (also data on github) Europeana (a EU-wide consortium) DPLA (similar to Europeana, but in the US)

Also, lots of other institutions have open access to their images - many much more comprehensive than The Getty (not to sound bitter, but Getty is getting a lot of press for this where others have lead for years). It's really an imperative at this point - and for most institutions more a matter of when (since few have substantial technical staff) than if. The Getty is an exception, having a large, well-funded staff, but tends to be conservative in policy changes.

For other examples of cultural institutions sharing images widely:

National Gallery (high rez) Los Angeles County Museum of Art The Met

All this just off the top of my head -

From what I can see, it is still pretty rare to release downloadable images at a decent resolution, so it is excellent that Getty is getting good press for this.

For example, the National Gallery has high-resolution scans, but they aren't downloadable (perhaps you can scrape them, but that isn't accessible to most people). LACMA has only low-resolution images.

Off the rest of your list, Rijksmuseum and the Brooklyn Museum have only low-resolution images, Cooper-Hewitt and Europeana have almost nothing (only very low-resolution). I haven't checked everything.

Maybe you are right that it is becoming more common, but I can't see that myself. The Met is the only example I know of that compares. The Met does have decent resolution downloads available, but the Getty's downloadable scans are still much better.

Thanks for all the pointers - you should make a top level comment with everything you know on this. To someone from outside the field of curation this is really heartening - to see museums moving toward open data and open access to digital reproductions.

Like others have already mentioned, I'll wait for a torrent link and bulk D/L all of them.

These seem to be very low resolution (~500px). Are there high resolutions available? I tried finding one for Van Gogh's 'Irises' and couldn't.

There is a high resolution download link on each item page directly under the thumbnail. In the case of Irises, the image is 4264 x 3282.

Thank you! Wow, on a Retina display this looks just amazing.

Yeah, the download of Irises is only 998x768 px.

And many of the others are similarly sized, almost all I've looked at under 1000px in either direction.

It's surprising how many waves this announcement has made over the last few days.

My question is: why now? What took so long?

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact