Smithsonian Releases 2.8M Images into Public Domain

blackbrokkoli · on Feb 25, 2020

Aw, man!

I really don't have anything substantial to say, but this is incredibly refreshing: No dark patterns, no marketing ploy, no signup, no bullshit. No licenses, no traps, no business interests!

Javascript not required, but smooth UI & UX! Just pure knowledge. API access; millions of files! Works right down to Lynx, an has a 3D previewer which is smooth and gentle to my mid range laptop.

I think this is gonna be my new example if someones says that commerce & ads are the backbone of the internet. If anyone of the responsible people happens to read HN, please continue to do good in the world!

zacharycohn · on Feb 25, 2020

This is a great example of tax dollars at work. The Smithsonian is part of the United States government.

kbutler · on Feb 25, 2020

The Smithsonian is unique.

If interested in the history of how it came to be, Mike Rowe tells the story well in "The Illegitimate Son-of-a-Smith" podcast: http://thewayiheardit.rsvmedia.com/episode-135-the-illegitim...

The Institution is 62% federally funded (a combination of the congressional appropriation and federal grants and contracts). ... the Smithsonian has trust or non-federal funds, which include contributions from private sources (endowments; donations from individuals, corporations and foundations; and memberships) and revenues from the Smithsonian Enterprises https://www.si.edu/newsdesk/factsheets/smithsonian-instituti...

From https://www.si.edu/ogc/legalhistory

created by Congress in 1846 to exercise the authority of the United States in carrying out the responsibilities Congress undertook when it accepted the bequest of James Smithson "to found at Washington, under the name of the Smithsonian Institution, an establishment for the increase and diffusion of knowledge among men."...

...the Smithsonian is not an agency or authority of the Government...

...the Smithsonian is so "closely connected" to the federal government that it shares the immunity of the United States from state and local regulation...

wpasc · on Feb 25, 2020

It's a 501c3 which is ~half funded by taxpayers and half funded by philanthropy. Almost like it forces you to recognize the benefits of both types of funding

xxpor · on Feb 25, 2020

I think it's more complicated than a pure 501c3. Their vehicles get US Gov licence plates from the GSA.

TkTech · on Feb 26, 2020

Indeed, the Smithsonian is completely unique under US law[1].

https://www.si.edu/ogc/legalhistory

klingonopera · on Feb 26, 2020

> commerce & ads are the backbone of the internet

I'm sorry, what? Everyone uses Wikipedia, and probably so much, that they don't consciously realize it...

oska · on Feb 26, 2020

I think what the GP is referring to is the dominant narrative over the last 15+ years that every website has to be 'monetised' with varying degrees of advertising based user abuse. Wikipedia has obviously been a conspicuous counter-example to that but it's been the large exception that proves the rule (in terms of narrative). It's always refreshing to see new examples of the web being used as a medium to freely disseminate information and culture without any form of user abuse attached. Commerce is fine but I personally see any form of push advertising as user abuse (even without the user tracking that nearly always comes with it).

Edit to add: I see that another reply (from jimmaswell) is still running the advertising 'business model' narrative.

klingonopera · on Feb 26, 2020

...the dominant narrative of people who want to sell you advertising technology, yes.

A shopping website will want to do two things: Provide the customer with the most comfortable shopping experience while cutting down the cost of providing it. Advertising is a null-sum game in this case (cuts down costs by adding additional revenue, but makes user experience worse, thus leading to fewer customers), so it only makes sense if that's all you're selling.

How the internet nonetheless became "infested" with ads is obviously a mystery to me...

jimmaswell · on Feb 26, 2020

Categorizing ads as "user abuse" is hyperbolic and I doubt any significan fraction of users feel they're being abused when they see a banner ad for potato chips. Bringing customers and companies together is a basic and necessary aspect of living in a market economy. I've seen commercials for new food offered at restaurants, wanted it, gotten it, and felt good about the whole chain of events; have I been abused?

Further if "tracking" makes ads more relevant then all the better for everyone involved. Some electrons in a server in a rack on the other side of the country, that you'd never know existed if people didn't stir up a hysteria over, not even directly correlated to your real identity or containing anything more important than a preference for high end silverware, can't hurt you and have no just cause to be demonized.

oska · on Feb 26, 2020

> Categorizing ads as "user abuse" is hyperbolic

Push advertising is deliberate mind pollution. You are pushing information onto people that they are not seeking and doing it in highly manipulative ways. It's fine to market your services or products in places that people go to to seek out that information; it's not fine to push it on them.

> and I doubt any significan fraction of users feel they're being abused when they see a banner ad for potato chips.

Victims (I use the term loosely) of the obesity epidemic may disagree. How dominant would junk foods like potato chips be in our diet today without the extensive advertising campaigns over the last 70+ years?

With regards to your second paragraph on user tracking, the fact that you use the word 'hysteria' to discount concerns indicates to me that it's not even worth engaging with you there.

briandear · on Feb 26, 2020

> Victims (I use the term loosely) of the obesity epidemic may disagree.

Come on. Victims? They were forced to eat too much? Do people not have any agency at all? We still have junk food advertising and we have people that aren’t obese. So are they immune to the messages?

Or perhaps, is the “epidemic” one of a sedentary lifestyle? Perhaps the rise of the information economy and the lack of having to go plow a field?

Classing fat peoples as victims is ridiculous unless they were attacked by a spoon.

henriquemaia · on Feb 26, 2020

OP was careful to frame the term. It's not a later edit, as you quote the framing too.

So the intention there is clearly a rhetorical one, that not being the point, but that advertising has managed to manipulate people against their better judgment. In that sense, yes, they are victims.

Besides, believing that people suffering from an ill that is destroying their lives are free to choose (when reality clearly shows they aren't) is a callous remark.

Eating disorders are very complicated. Eating is such a low level activity that easily supersedes any rationalisations. That's why it's a good example of how bad advertising can be. So again OP is right in calling those people victims.

fauigerzigerk · on Feb 26, 2020

There's clearly a dilemma. Yes we know that there are messages that, on average, create a bias in people towards particular outcomes. So, collectively, you could say that we can be manipulated.

At the same time, we have to assume that most individuals are capable of breaking this spell if necessary, for instance by making a conscious decision to read up on what unhealthy food does to us.

Otherwise, you would have a justification to take away any and all freedoms from everybody.

Commercial advertising is not the only form or motivation for trying to make people do things. "Manipulation" exists in personal relationships, it exists in religion, it exists in political messaging. We even call it dog whistle politics. And the motivations are not necessarily any more noble than those for running chocolate chip adverts.

We don't have an external all-knowing referee to decide what is good for us, what our priorities should be, and what we only do or believe because we are being manipulated. So if we put too much emphasis on this manipulation and victimisation narrative, we create a huge incentive for someone to assume that role.

wsy · on Feb 26, 2020

There is no dilemma. We agree that people can be manipulated to some degree. We also agree that - if they don't suffer from an illness - that manipulation only goes so far. Finally, we agree that we shouldn't force a lifestyle on anybody.

So now you can manipulate (or nudge) people either towards behavior which is considered as beneficial (regular exercise, healthy food, etc.), or you can manipulate them towards unhealthy habits (e.g., smoking).

It is a good idea to limit the ability of actors to nudge people towards unhealthy habits. That will help everybody except a few people who will earn less money. It is also a good idea to foster nudging good behaviors, even if there is no commercial value behind it.

The dilemma only starts to occur if we try to put all things into either the good or the bad box. Probably most ads are in the large grey area in between these extremes. However, ads fostering smoking addiction or obesity should be called out as bad for all of us. There is no downside of this call-out.

jimmaswell · on Feb 26, 2020

We're inundated with health activism constantly, some of it government funded. The only instance informed choice doesn't come in is where people have no access to/ability to afford healthier food.

klingonopera · on Feb 26, 2020

I agree that not all ads are "abuse", but some either are or are borderlining it, depending on your definition.

If you're genuinely interested in the advertising industry's development the last 100 years or have a few hours to spare, check out the 4-part documentary "The Century of the Self" by the BBC made in the early-2000s: https://www.youtube.com/watch?v=DnPmg0R1M04

blackbrokkoli · on Feb 26, 2020

You are absolutely right, it is not all bleak out there of course. Most people also use a lot of open source software without realizing as well, to give another example. I merely wanted to express how delighted I was when I clicked on this particular HN link, which did stand out as remarkably bullshit-free at least for me.

klingonopera · on Feb 27, 2020

I agree, the website looks nice, the content is great, thank you American tax dollars for this contribution to human history.

I just didn't want to leave that statement there and have people unconsciously take that statement about "commerce & ads" to heart without consciously realizing, that Wikipedia is probably our most popular counterexample.

Depending on your definition of 'backbone'... it's maybe not a truth humanity is particularly proud of, but I have this figure in my mind that 75% of the web's bandwidth is used by porn?

Was that ever true, is that still true, are those numbers accurate? Or is that just a common myth?

jimmaswell · on Feb 26, 2020

A handful (probably already the maximum that the model supports) of websites can get by on donations (and then have their hands tied as to how to use the money, potentially preventing expansion or major changes), and a website that's nothing but "<primarily offline company>'s website" doesn't need ads or anything in the first place because it's just for information on the main business (but if they'd profit more from running some ads on it I wouldn't blame them; optimally that would lower prices but in reality it would just go to the CEO's bonuses); these do not disprove that the ad/subscriprion models are the most viable way for most internet businesses to operate.

mopsi · on Feb 26, 2020

The internet is more than just internet businesses. It costs me 70€ per year (paid in small monthly installments) to run a site to share my thoughts and info & tips related to hobbies. I don't need ads & tracking, donations nor any other form of monetization to run the site.

klingonopera · on Feb 27, 2020

...and in some countries, those costs (assuming hobbies have any alignment with work skills) can be reclaimed via tax reclamation as well.

Apart from that, self-hosting is even cheaper, and easier, than ever.

saberworks · on Feb 25, 2020

Are we on different websites? I used the search box to search "cat" and the "enter" key doesn't work to perform a search, I have to mouse over and click the magnifying glass to get it to work. It takes about 45 seconds to load about 2 screens of cat pictures. I have to scroll down and then click "Show more" to get some more. There's no indication of how many times I have to click "Show more" to get all the cat pictures. Once I click "Show more" it appears that nothing is happening, but about 30-45 seconds later the screen starts blipping and then another couple of screens of cats start popping into view. Clicking on an image takes another 30 seconds and then it brings you to a slightly bigger view, and then clicking it again shows another slightly bigger view but trapped into some weird-scrolling viewer with 3 layers of vertical scrollbars. The back button is also broken, having to click it multiple times to escape the single-image screen.

I think a simple pager with a fixed number of images on each page would have been a much better experience. It would also let me link to specific pages so I could share the page I'm viewing.

EDIT: I'm leaving the post above but right after submitting it I felt regret because it sounds really harsh and I didn't mean it that way. I love the fact that the images were released and I hope government and private institutions continue to do this. I have very minor gripes about the ui but that shouldn't take away from the accomplishment.

crazygringo · on Feb 25, 2020

I kinda think you are? Zero of that is true for me, and I mean zero.

Hitting enter searches for me. It shows results instantaneously. It says clearly at the top in big letters "7,636 results for cat" so you know that you can pretty much hit "show more" endlessly. Which instantly loads. And when I click on something yes there is a 2nd scrollbar (not 3) but it's only for the "object details" sidebar... it doesn't apply to the main page viewer part. And the back button works perfectly.

Honestly it sounds like you just have a terrible network connection maybe, or they had temporary traffic congestion? Or what browser are you using... perhaps it's one they haven't tested against? (I'm using Chrome on a MacBook.)

saberworks · on Feb 25, 2020

I don't really want to put more focus on this but...

I was using firefox on a new mac. Some extensions like ublock installed. I just tried again now with a different browser (latest vivaldi, no extensions) and it behaves the same (search bar, but only if you have "collection images" highlighted, enter key works as expected if you have "all" highlighted). I am still experiencing the slowness and image "popping" when using the show more link. I'm still getting 3 layers of scroll bars on the image viewer. The back button is a little bit better but of course since it's some psuedo-infinite-scroll when I click back I'm landed back at the top of the list with all my "show more" clicks undone (so lost where I was in the results list as is common with infinite-scroll and SPA).

I can't dispute possibility of network issues, could be the site is being slammed, but speedtest.net says my connection is >300Mbps.

naibafo · on Feb 26, 2020

I use Firefox on a Macbook as well, also with ublock.

The site is definitely slow at the moment, but enter key worked for me and on the top of the page I see the number of results that were found...

CaptainZapp · on Feb 26, 2020

I have tracker blocking extensions and ad blockers to the hilt, which makes some sites almost unusable. For example: I simply don't see ReCaptchas with how the browser is configured.

The easy solution (if you trust the site) is to restart Firefox without add-ins:

Hamburger Menu > Help > Restart with add-ons disabled

You then get a dialog box, where you can restart in safe mode, or clean-up your Firefox installation.

Don't forget to restart with add-ons re-enabled (can also be done from the help menu.

lifeisstillgood · on Feb 25, 2020

On iphone, type "cat", hit enter key, see 2657 cat photos (possibly worlds smallest online collection of cat photos but still...)

This is cool, and reminds me of the book illustrations and images released last month - there seems to be a corner turned in opening up collections, in as GP implies, knocking holes in the walls of the gardens.

I am impressed by this too. Kudos to the big museum in DC.

As for your experience, well perhaps a cat was sitting on your keyboard while you searched? They can be annoying like that

yorwba · on Feb 25, 2020

> On iphone, type "cat", hit enter key, see 2657 cat photos

Did you actually see all of them? Using Firefox Preview on Android, I got told that there were this many results, but I was only shown a handful before I had to tap on "view more".

blackbrokkoli · on Feb 25, 2020

There is no solution for displaying 2657 cat photos in a client browser in 2020 which does not include one of the following:

* some sort of pagination, like a iterating "See more" button or the classic "Next page" stuff

* fancy JS in the background simulating that process making the page much heavier, ruining accessibility and most likely the back button

* instantly draining the clients monthly data, crashing the device, timeouting the server and making the infrastructure team cry, roughly in that order

blackbrokkoli · on Feb 25, 2020

None of this mirrors my experience. Sounds like a network problem to be honest, either on your side or server hiccup? Loading any images will have some inherent network load, sure. But I'm having no problems despite the decently shitty connection I have. Scrollbars are also working as expected...the 'show more' button has to be there in the end, you can't load all 3 million images at the same time (which would be the case using '' as search term otherwise). Sure you could do classic pagination as well, but again the real pain sounds like connection issues...

shanecleveland · on Feb 26, 2020

It's great that a government website is receiving praise, but not really fair to compare them to the needs of a for-profit enterprise that needs to generate revenue.

m463 · on Feb 26, 2020

Normal people not on lynx will see doubleclick, addthis, forsee, google analytics, etc on the pages. Just saying...

hesarenu · on Feb 26, 2020

Is it smooth though? Very slow to load, but i am in a different continent.

gambler · on Feb 26, 2020

Reminder to myself: donate to them when I get home.

carapace · on Feb 25, 2020

https://www.si.edu/openaccess The actual thing.

> Welcome to Smithsonian Open Access, where you can download, share, and reuse millions of the Smithsonian’s images—right now, without asking. With new platforms and tools, you have easier access to nearly 3 million 2D and 3D digital items from our collections—with many more to come. This includes images and data from across the Smithsonian’s 19 museums, nine research centers, libraries, archives, and the National Zoo.

nessunodoro · on Feb 25, 2020

If you're into natural history, search the 3D models. The in-browser viewer works great.

https://www.si.edu/object/3d/490b6301-3869-455a-ba71-a89f5b6...

dTal · on Feb 25, 2020

Holy crap. "Works great" doesn't do it justice. I'm on mobile and I was not expecting a buttery-smooth, photorealistic, flawlessly erogonomic multitouch experience with an abundance of advanced powertools like section analysis. That thing would be one of the most impressive apps I've ever seen, never mind Web Apps!

I want this as a Jupyter plugin, very badly.

(where do I find more examples? The viewer is a work of art but did I miss the hyperlink to "more cool stuff"?)

(edit: https://3d.si.edu/ )

akiselev · on Feb 26, 2020

The 3d model viewer is published at https://github.com/Smithsonian/dpo-voyager

dTal · on Feb 26, 2020

Excellent find!

yorwba · on Feb 25, 2020

Looking at the source, it seems like their viewer is based on OpenSeadragon: https://openseadragon.github.io/

framefactory · on Feb 27, 2020

Voyager, the 3D viewer uses Three.js for WebGL rendering and a custom entity/component architecture for both the scene tree and the general architecture of the application. The UI is built using custom Web Components with the help of LitElement and LitHtml. The viewer itself is a web component and super easy to embed.

The tool suite also provides an authoring environment for annotating and editing scenes. Documentation: https://smithsonian.github.io/dpo-voyager. Contributions are very welcome!

Full disclosure: I'm the developer of the Voyager 3D suite.

dTal · on Feb 25, 2020

Then they've added an astonishing amount of value. OpenSeadragon appears to be nothing more than a zooming 2D image viewer that consumes image pyramids? I'm not sure why you would even start with that, when your goal is to render hi-res 3D meshes in real time. Is there even an overlap in functionality?

sunsetMurk · on Feb 25, 2020

Oh my. I'm in love. This is going to divert some of my attention for a side project... Hopefully for the better!

This is so packed full of useful features, and at first glance a very thoughtful implementation and healthy ecosystem.

dflock · on Feb 25, 2020

https://www.si.edu/openaccess

freepor · on Feb 26, 2020

It’s even buttery smooth on an iPhone 6s.

johannes1234321 · on Feb 26, 2020

Now we have to combine this with Wikipedia, so the text of the article directly references the 3D model, with parts of the 3D model. Text is good, but exploring visuals helps understanding.

tzfld · on Feb 25, 2020

How much time until getty start sending copyright notices on these?

Spoom · on Feb 25, 2020

Probably immediately since it's likely driven by a bot. They likely don't actually claim to own the copyright, just that you haven't negotiated a license for the image with them (they probably avoid mentioning that you don't need to).

raister · on Feb 26, 2020

What if someone created a service like "do I need a license for using this image?" - the system search by image somewhere and return its availability in the public domain (perhaps someone already thought of this though, it's too obvious...

ShorsHammer · on Feb 26, 2020

Not universal, but here they seem to be using iiif, you can simply add info.json after the identifier path and get the licensing info.

arvinsim · on Feb 26, 2020

This was first thought after reading this post.

Hope that everyone is informed that they don't need a license to use the images.

wst_ · on Feb 26, 2020

This reminds me of Polish National Library web site. They did the same thing some years ago. They went into great effort to scan and publish digitally huge amount of books, publications, postcards, images, etc. Some books are centuries old. A lot of Polish content but not only. Have to admit, looking at old prints is a magical experience to me.

https://polona.pl/ (change language on the bottom)

Edit: I can see now, both, Smithsonian and National Library of Poland are part of the IIIF Consortium.

anigbrowl · on Feb 25, 2020

Getty's litigation department is gonna be so busy

kkotak · on Feb 26, 2020

This.

lootsauce · on Feb 25, 2020

Is there a comprehensive list of all such large collections that have been made available. I recall the Metropolitan Museum of Art released hundreds of thousands of images a while back.

lootsauce · on Feb 25, 2020

Ah here is prior discussion on the Met release https://news.ycombinator.com/item?id=7781846

CaptainZapp · on Feb 26, 2020

Maybe not quite in the same league and arguably of more local interest. But the ETH in Zurich makes a library of 100'000s of pictures, graphics, historical artifacts, architecture and photography publicly available.

All[1] of it in hi-res as TIFS:

https://www.e-pics.ethz.ch/en/home_en/

[1] Disclaimer: I didn't check them all, for the ones I one was involved with, the archive of historical buildings in Zurich that's definitely the case.

ShorsHammer · on Feb 26, 2020

You could try here: https://iiif.io/community/#participating-institutions

Everyone releasing large collections tends to use the protocol. You'll find all the biggest in that list plus many more.

captn3m0 · on Feb 26, 2020

Paris museums, recently: http://www.openculture.com/2020/01/14-paris-museums-put-3000...

Vinnl · on Feb 26, 2020

Europeana has lots, including high-res scans of the works of my favourite museum: https://www.europeana.eu/

simonsarris · on Feb 26, 2020

This is great.

I keep a list of 'High Quality Collections of Digitized Art and Archival Finds' and will add this one.

https://simonsarris.com/art-collections

iza · on Feb 26, 2020

There's also the Getty Open Content Program:

- https://www.getty.edu/about/whatwedo/opencontent.html

- https://search.getty.edu/gateway/search?cat=highlight&dir=s&...

lootsauce · on Feb 26, 2020

Here is a great collection of maps you might want to add.

https://www.davidrumsey.com/

License there is CC BY-NC-SA 3.0

simonsarris · on Feb 26, 2020

Thanks, this very cool.

arbitrage · on Feb 26, 2020

Thank you sooooo much. This is such great work, the wife and I were just discussing the need for something like this mere days ago.

You are a true scholar.

acd10j · on Feb 26, 2020

How can we prevent sites like Getty images to claim copyright on these images and suing users who use these images ?

gowld · on Feb 25, 2020

"Releasing to the public domain" isn't something that can affirmatively be done. The best you can do, which Smithsonian did, is CCO

https://creativecommons.org/publicdomain/zero/1.0/

Thanks to Creative Commons team for teaching the government how to do its job .

chipsa · on Feb 25, 2020

Unless you're the US government, in which case everything is public domain. A copyright cannot be claimed on USG work product.

zozbot234 · on Feb 25, 2020

Assuming that you actually control the copyright, of course. This used to be (and may still be) a significant issue wrt. the Smithsonian itself; they host all sorts of content on their websites for which the actual copyright-ownership situation was apparently entirely unknown (to the Smithsonian themselves and anyone else).

billfruit · on Feb 26, 2020

Looks like you one needs to use a search box to view images. Though it is being provided free of cost, and one shouldn't count the teeth of a gift horse, it would have been better if there was some type of catalogue to see what exactly is in the dataset.

stfwn · on Feb 26, 2020

This is awesome!

I typed in 'jazz' and got a page on Charlie Parker's saxophone, a King Super 20. It has 8 very high quality and nicely composed photos that you can download in TIFF format, some up to ~300MB in size. Each photo has a detailed description in text, as well as the dimensions of what's seen in the shot, info on when/where it was made and by whom and lots more! What a great project.

https://www.si.edu/object/nmaahc_2019.10.1a-g

E: One of the images has already been added to the Wikipedia entry for Charlie Parker a couple of hours ago. How cool is this data set.

nomercy400 · on Feb 26, 2020

..and Getty gains 2.8M images they can use to send copyright infringement letters...

reaperducer · on Feb 25, 2020

"CC0 only applies to copyright so you may still need someone else’s permission to use a CC0-designated digital asset."

So, not actually "public domain," but about as close as we get in 2020.

kibwen · on Feb 25, 2020

IANAL, but as I understand it CC0 is equivalent to public domain in jurisdictions where the concept of "public domain" exists. It's an abstraction layer over the fact that not all countries have such a thing.

zozbot234 · on Feb 25, 2020

Typically, this refers to relatively uncommon issues like trademarks, or privacy/publicity rights wrt. depictions of (contemporary, not historical) people. It's good that Smithsonian is pointing out these issues, but sophisticated reusers are well aware of them.

colonwqbang · on Feb 25, 2020

No that's not how I read it at all. A museum doesn't originate anything, they get donations. "You may need someone else's permission" because the Smithsonian doesn't guarantee that noone else in the world has a (good or bad) IP claim to one of the donations.

That's the kind of guarantee that you will never get unless you pay for it.

nl · on Feb 25, 2020

You may need someone else's permission" because the Smithsonian doesn't guarantee that noone else in the world has a (good or bad) IP claim to one of the donations.

That's not the intention of CC0:

In contrast to CC’s licenses that allow copyright holders to choose from a range of permissions while retaining their copyright, CC0 empowers yet another choice altogether – the choice to opt out of copyright and database protection, and the exclusive rights automatically granted to creators – the “no rights reserved” alternative to our licenses. ... Dedicating works to the public domain is difficult if not impossible for those wanting to contribute their works for public use before applicable copyright or database protection terms expire ... CC0 helps solve this problem by giving creators a way to waive all their copyright and related rights in their works to the fullest extent allowed by law.

In this case Smithsonian seems to have done the work to make sure they are out of copyright, and now they don't reserve any rights to the work.

https://creativecommons.org/share-your-work/public-domain/cc...

mikekchar · on Feb 26, 2020

I think it's likely that "may need someone else's permission" refers to things like waivers for using a person's likeness, or so-called moral rights. These are not usually included in the bucket of things most people call "intellectual property". It's more like, if I have a picture of a person and use it to sell something, some people may conclude that the person is endorsing that good. There are many countries that have separate laws concerning what you can and can't do with images specifically.

riazrizvi · on Feb 25, 2020

Tax dollars, invested into an institution that creates a public good, that is then made available to the public, where any entrepreneur can recycle it for profit and create more tax revenue. Beautiful fair capitalism at its finest!

mc3 · on Feb 26, 2020

A bit like free education.

oblio · on Feb 26, 2020

What's "fair capitalism"?

riazrizvi · on Feb 26, 2020

Systems that encourage money to move into the hands of people that earn it, as opposed to lucky people who were born into it. Key examples of fair capitalism boosters (all be them for limited groups of people): 1217 England (Magna Carta for English lords), 1618 North America (when the Virginia Company introduced profit sharing for white settlers in the failing Jamestown), 1688 England (the Glorious Revolution for property owners), 1776 North America (where the new USA codified equitable rules for European men).

In all these cases Royal monopoly power was curtailed, allowing more people the opportunity to profit from their efforts. And there were relative economic booms as a result. But we strangle growth when we reinforce monopoly power, or when we reinforce laws that keep money in the hands of people that didn’t earn it (and also strengthen the political power that money alone has).

When monopoly power is a temporary reward for innovators, it works. Anyone that seeks to extend the time limits of monopoly powers beyond the sweet spot is trying to restore one of those unfair capital systems that kill economic development. They generally don’t care because for them, it is about enriching themselves whatever the cost to society.

bluedino · on Feb 25, 2020

Torrent?

toomuchtodo · on Feb 25, 2020

I asked for a data dump via email, waiting to hear back. Ripping it all out is going to cost someone a pretty penny since it's hosted in AWS, better to ask someone to fill and ship some SATA drives. Then off to the Internet Archive!

zorm · on Feb 25, 2020

I saw a companion AWS blog on this: https://aws.amazon.com/blogs/publicsector/smithsonian-3-mill...

Looks like its part of the public dataset program so you can probably just ask for the bucket name and get full free access to everything.

peatmoss · on Feb 27, 2020

I can confirm that this dataset is indeed part of the AWS Public Dataset Program. We were finalizing some details, but this dataset is now listed in the Registry of Open Data:

https://registry.opendata.aws/smithsonian-open-access/

Enjoy! I’m personally very excited about this dataset, and couldn’t be more impressed with the people, mission, and, well, everything at the Smithsonian.

Source: I work on the AWS Open Data team / had a small role in this (normal lawyerspeak caveats apply that my views and opinions are my own)

toomuchtodo · on Feb 28, 2020

Thank you!!

Reelin · on Feb 26, 2020

This is weird. The Smithsonian page [1] says "Data hosting provided by AWS Public Dataset Program", and there's the Amazon blog post you linked, but the data set seems to be missing from the registry Amazon publishes. [2] I guess no one submitted a pull request yet? [3]

[1] https://www.si.edu/OpenAccess

[2] https://registry.opendata.aws

[3] https://github.com/awslabs/open-data-registry

toomuchtodo · on Feb 25, 2020

> just ask for the bucket name and get full free access to everything

Jed or jeffbarr, what say ye?

toomuchtodo · on Feb 28, 2020

Disregard! Pete’s comment is sufficient for bucket enumeration.

ChrisArchitect · on Feb 26, 2020

How much data do you think it encompasses? Approaching/surpassing Petabytes?

toomuchtodo · on Feb 26, 2020

Couldn't even hazard a guess, I can send a Backblaze-like chassis if needed, but if there's no cost to pull from the AWS public S3 bucket, I'll just spin up some VMs and enumerate everything into items in the Internet Archive. Trying to find the balance between time, cost, and inconvenience for all parties involved.

ChrisArchitect · on Feb 27, 2020

still wondering/trying to get a sense of how much data has been accumulated in the open data digitization process. Gleaning that it might be multiple petabytes based on the open data sessions that were going on last few years but dunno...

ChrisArchitect · on Feb 28, 2020

Some of the ppl interested in the data and a lead at AWS got back to me on it:

aws query on the open dataset comes back with:

Total Objects: 4649789

Total Size: 312.5 TiB!

projproj · on Feb 25, 2020

Oh, cool. They have a search API. I'll plan to add this as a source to canweimage.com.

nessunodoro · on Feb 25, 2020

OT- Neat, have you considered incorporating non-Wikimedia Commons images in your public domain-only search? Flickr in particular has a surprising amount of quality highres images that the authors have released into the PD.

projproj · on Feb 25, 2020

I hadn't looked into any other image sources. I'll check out Flickr too though.

zozbot234 · on Feb 25, 2020

> Flickr in particular has a surprising amount of quality highres images that the authors have released into the PD.

Arguably, such images should be mirrored onto Commons ASAP (if they plausibly have educational value and are PD in most of the world). It is unwise to rely on a single profit-oriented company like Flickr to keep hosting this sort of stuff for the foreseeable future, whereas Commons is independently supported and specifically intended as a host for permissively-licensed media content.

carlinmack · on March 1, 2020

there are bots that patrol Twitter, upload permissive content with the correct attribution and link back to the source. See more here: https://commons.wikimedia.org/wiki/Commons:Flickr2Commons

fsflyer · on Feb 26, 2020

Nice project. One request, can you add some control over the metadata searched? I searched for ‘aardvark’ and I got a bunch of images of buildings and trees taken by folks with aardvark in their usernames.

projproj · on Feb 26, 2020

Yes, I can look into that.

purerandomness · on Feb 26, 2020

Nice one, great project!

Would love if the URI would change after searching so I can share search result URLs.

projproj · on Feb 26, 2020

That's a great suggestion. Thanks for chiming in.

karateka · on Feb 26, 2020

This is awesome, thanks for building it

cenkozan · on Feb 25, 2020

While there, think of subscribing to their mail list as well! Unbelievably good!

gxnxcxcx · on Feb 26, 2020

The si.edu list or the smithsonianmag.com one? Maybe I'll try both just to be sure. :)

ilamont · on Feb 26, 2020

Using these images may not be so easy on established media platforms. Amazon, for instance, screens ebooks for PD content.

peter303 · on Feb 26, 2020

How many 'Library of Congress' datasets is that?

1 LOC = 15 TB according to google

russfink · on Feb 26, 2020

Inverted Jenny ... flex much? (J/K - awesome gallery!)

markdown · on Feb 25, 2020

The search feature seems to be getting a hug of death.

lerie1982 · on Feb 25, 2020

API is handling well, give it a go.

ryan_lane · on Feb 25, 2020

CC0 license on assets, which is pretty great.

lerie1982 · on Feb 25, 2020

The API is great.

danem · on Feb 25, 2020

Link to the API docs? I can't find them.

adventured · on Feb 26, 2020

https://edan.si.edu/openaccess/apidocs/

alexcnwy · on Feb 26, 2020

Very cool

botwriter · on Feb 25, 2020

Until Getty starts suing people for using them!

Koshkin · on Feb 25, 2020

> for patrons

gjm11 · on Feb 25, 2020

I don't think that means anything like "for people who have physically visited the Smithsonian" or "for people who pay us some sort of subscription"; if you use their website then you're a patron in the relevant sense.

Their webpage says "Welcome to Smithsonian Open Access, where you can download, share, and reuse millions of the Smithsonian’s images—right now, without asking." And everything's CC0-licensed.

WalterBright · on Feb 25, 2020

> and forbidding commercialization

It's nice to see the Smithsonian get over their distaste of the market economy in the US.

thaumasiotes · on Feb 26, 2020

Well, not quite.

> the Smithsonian is featuring less than 2 percent of its total collections in this initial launch. Much of the rest may someday be headed for a similar fate. But Kapsalis stresses the existence of an important subset that won’t be candidates for the public domain in the foreseeable future, including location information on endangered species, exploitative images and artifacts from marginalized communities.

> “The way people have captured some cultures in the past has not always been respectful,” Kapsalis says. “We don’t feel we could ethically share [these items] as open access.”

If they don't trust you to think the right things about their artifacts, they're not letting you use those artifacts.

zozbot234 · on Feb 26, 2020

> If they don't trust you to think the right things about their artifacts

It's not about you or us, it's about anyone who might complain that the Smithsonian Institution is (as they see it) "endorsing" unethical practices by releasing that particular content (a minor part of what they currently hold, since "much" is in fact up for release) out in the open. Whether people "think the right things" about the Smithsonian's actions can have real consequences, so it makes sense for them to care quite a bit about that.