
The Museum of Modern Art Research Dataset - danso
https://github.com/MuseumofModernArt/collection/
======
kjell
The Cooper Hewitt (to my knowledge) was the first museum to post their museum
metadata on github:

[https://github.com/cooperhewitt/collection](https://github.com/cooperhewitt/collection)
[http://labs.cooperhewitt.org/2012/releasing-collection-
githu...](http://labs.cooperhewitt.org/2012/releasing-collection-github/)

There are a few other museums that have done so since:
[https://github.com/tategallery/collection](https://github.com/tategallery/collection)
[https://github.com/artsmia/collection](https://github.com/artsmia/collection)
are the two I can think of right now

The Rijksmuseum and the Walters both have CC0 metadata + images, but through a
queryable API instead of downloadable csv/json.

[https://www.rijksmuseum.nl/en/api](https://www.rijksmuseum.nl/en/api)
[http://api.thewalters.org/](http://api.thewalters.org/)

~~~
wjnc
It would be awesome if someone would use datasets like these to create a
valuation of those musea.

Did you know musea do not know / publish the valuation of all of their works?
And that they only ever sell works, even those they will never show to the
public, to buy new works. And that buy selling a few percent of their assets
they could basicly provide free access, with no loss to what works they show
to the public. All these gems, and more, from a great article and podcast [1],
[2]. (Totally not affiliated, but major recent eyeopeners for me.)

[1] [http://www.democracyjournal.org/36/museums-can-change-
will-t...](http://www.democracyjournal.org/36/museums-can-change-will-
they.php?page=all) [2]
[http://www.econtalk.org/archives/2015/05/michael_ohare_o.htm...](http://www.econtalk.org/archives/2015/05/michael_ohare_o.html)

~~~
zz1
[It might not seem so, but this is a constructive criticism]

You clearly have no idea whatsoever about what museums are, how do they work
and what do they do.

No, museums don't sell heir works. Only some US museums do so, and they are
harshly blamed by all other museums all over the world.

No, works in deposit aren't "works that the public won't see", but essential
works to ensure the longest lasting to the pieces usually shown (but
periodically put to rest through a planned turnover). No, deposits aren't a
trove of unexploited treasures: some important gems and a lot of rubbish.

A museum is not a shop, but a conservation institution: it works on a hundred
years perspective. Sure, sell a couple works to grant gratis access for 100k
people over 10 years. And what do we do for the 390 next years?

Do you know anything about the art market? Clearly not: there is no value,
just trend and fancy. Do you know something about economics? The simple fact
that a work is kept in a museum (and thus is outside the market) changes the
prices of similar works (if ever such thing existed: we are talking about
unique pieces, no one is ever "similar" to another) available on the market.
Thus you can't use the market prices of an artist to estimate the value of an
artwork (putting another one on the market will dilute value of the other
ones). But yet, you should know about history and how a provenance might
affect the market price of an artwork.

Do you know something about art history? Because that would teach you one
thing or two about how museums are for keeping works and transmit a legacy
that outgrows small periods of time, like a lifespan. A museum isn't to sell
works, because it isn't to follow present trends (you might want to find out
how praised was Caravaggio in the XIXth Century, or Georges de La Tour around
the same time).

And please, please, tell me: how would you evaluate museums? The one with most
value is the best one? The one with the most items? Well, make an inventory of
a museum, first, and then let me know if you changed your mind once you find
out that "item" has absolutely no meaning.

Also: your latin purism is plain silly and clearly makes it a bit harder for
your reader to understand you (I had to think it over a little bit before
finding that it wasn't a typo). Neither Greeks nor Roman (Latins) had museums.
The first one was established in Rome in the XVth century: the world "musea"
was never a thing.

------
d--b
Ha! The data is completely useless. None of it is uniform. Some artists are
repeated twice with different names. The dates are totally unusable. The
author field may contain one or more authors, etc.

MoMA wanted to look tech-savvy and trendy by publishing this data to github.
But it just makes them look like an old institution that doesn't know how to
keep their data clean and meaningfully structured. This is a bit sad. Plus
Github's really not meant to publish content. Of course you can but it's
weird.

Yes, art is hard to classify, but come on... we're talking about one of the
richest museums in the world, and they can't properly manage 200k of data? I
would be pretty afraid lending them anything...

~~~
zachrose
I suspect art cataloging has a very specific taxonomy, or even deliberate
examples of flaunting taxonomy for artistic reasons.

Normalization risks equating Prince and the artist formerly known as Prince,
which is a worse picture of reality for a lot of purposes.

~~~
d--b
I understand it is impossible to make these things fit in a proper relational
model. For instance, modeling dates when artworks are made circa a date,
between 2 dates, started by someone and finished by someone else, is very
complex. That said, they should at least try. Being able to sort artworks by
date seems pretty fundamental to me!

Look at the data released by the Cooper-Hewitt that someone posted below. They
actually have some kind of structure!

------
minimaxir
I took a look at the data. The data schema is disorganized to the point that a
_lot_ of janitorial work would be necessary to get it useable and perform any
analysis or visualization.

For example, some works have a date of "1896" and others have a date of
"1976-77" or "c.1937"; artist bios can have nationality, year-of-birth and
year-of-death but not necessarily all 3; dimensions can be "12 1/2 × 12 1/4"
(31.8 × 31.1 cm)" or "204 x 48 x 48 inches (variable)", etc.

~~~
_pmf_
> I took a look at the data. The data schema is disorganized to the point that
> a lot of janitorial work would be necessary to get it useable and perform
> any analysis or visualization.

In other words, it is real world data.

------
mrspeaker
This is so cool! (but I can't help but be a little disappointed that there are
no image resources include - even some low-res thumbnails would be great!)

~~~
danso
I've only briefly perused the dataset...but there is a URL data, and many of
the works have this filled in. The MoMA has always had one of the better
structured websites, so it wouldn't be hard to write a scraper to grab the
associated image.

e.g.

[http://www.moma.org/collection/works/101730](http://www.moma.org/collection/works/101730)

[http://www.moma.org/media/W1siZiIsIjIxNzgzOCJdLFsicCIsImNvbn...](http://www.moma.org/media/W1siZiIsIjIxNzgzOCJdLFsicCIsImNvbnZlcnQiLCItcmVzaXplIDIwMDB4MjAwMD4iXV0?sha=ecf15b9af177b4ed)

(I guess one thing the MoMA could improve on is moving their image assets to a
CDN...that thing took awhile to load)

------
ZoeZoeBee
Its absolutely amazing the vast collections inside of Museum storage, most
items will hardly ever see display.

~~~
ilzmastr
Not just museums do this, a lot of collectors I've read about do something
like it: [http://bit.ly/1N49UaR](http://bit.ly/1N49UaR)

------
ilzmastr
I used to work here and think its an amazing API:
[http://developers.artsy.net](http://developers.artsy.net)

The playground is very cool.

