
The US Library of Congress has put 25M items free online - leephillips
http://www.sciencealert.com/the-us-library-of-congress-just-put-25-million-records-online-free-of-charge
======
xvilka
Meanwhile Elsevier, who is widely known for inhibiting science progress by
setting incredible high prices even for government funded research papers,
makes a move against SciHub [1] and LibGen [2] again [3].

[1] [https://sci-hub.cc/](https://sci-hub.cc/)

[2] [http://libgen.io/](http://libgen.io/)

[3] [https://torrentfreak.com/elsevier-wants-15-million-piracy-
da...](https://torrentfreak.com/elsevier-wants-15-million-piracy-damages-from-
sci-hub-and-libgen-170518/)

~~~
agumonkey
I find scihub/libgen a very important project actually. Mirrors should be done
(not an order, just a plan).

~~~
petepete
I've never heard of libgen but it's blocked by my ISP (in the UK)

~~~
user5994461
Indeed.

    
    
        Access to the websites listed on this page has been blocked pursuant to orders of the high court.

~~~
xvilka
SciHub has onion site scihub22266oqcxt.onion, I don't know about LibGen
though.

------
lgierth
I added the raw .gz files to IPFS when Library of Congress announced this last
week:
[https://github.com/ipfs/archives/issues/152](https://github.com/ipfs/archives/issues/152)

It's slightly more than 100GB and here it is:
[https://ipfs.io/ipfs/QmWSzgkftVrkh2859bGT44ahzoqcGhFkjsrQUtH...](https://ipfs.io/ipfs/QmWSzgkftVrkh2859bGT44ahzoqcGhFkjsrQUtHen9hVw9/)

(Note that the filesizes in the directory listing are all wrong -- that's the
original index.html from loc.gov/cds/downloads/MDSConnect/)

This makes it a lot easier to use this dataset at e.g. hackathons, where a lot
of people would simultaneously pester that LoC server, which already seemed
pretty bandwidth-limited on its own when I downloaded the files.

~~~
toomuchtodo
Did you push them into the Internet Archive yet? If not, going to grab a beer
and start iterating through your IPFS objects.

~~~
lgierth
Go for it! :) I've never pushed anything to IA so far to be honest.

~~~
toomuchtodo
For future reference then!
[https://github.com/jjjake/internetarchive](https://github.com/jjjake/internetarchive)

~~~
voltagex_
If someone wants to pay for / donate 100GB worth of bandwidth on a VPS
somewhere, I'll do it.

~~~
tty7
scaleway: €2.99/m 200Mbit/s Unmetered bandwidth

~~~
voltagex_
Okay, so who's going to spot me 3 EUR? ;)

~~~
toomuchtodo
Could TransferWise you the $3.35USD. That work?

~~~
voltagex_
This thread is getting too long; how do I contact you?

~~~
tombrossman
This amount of usage falls well within most VPS companies' free trial / promo
codes offerings and should cost you nothing. Use a throwaway email account and
drop it after a month.

AWS will give you a whole year if you haven't tried them yet and the other
popular VPS companies (DO, Linode, etc.) all will give you at least $10
startup credit. This is probably simpler and faster than figuring out how to
receive <$4 from some random internet commenter.

~~~
voltagex_
Yes. This isn't the first time I've tried to contact toomuchtodo.

Anywho, the amount of data is actually ~19GB which is well within what I can
upload with my home connection. Unfortunately the ia tool is failing for me:
[https://github.com/jjjake/internetarchive/issues/176](https://github.com/jjjake/internetarchive/issues/176)

Also, it's not really about the $3, more that a tonne of "$3" projects really
add up over a year or so.

~~~
toomuchtodo
Email sent.

~~~
voltagex_
I have been advised that the data already exists on archive.org.

~~~
toomuchtodo
Email me back. Would still like to buy you a beer for your troubles.

------
themodelplumber
Just in case anyone else is wondering: This is, as I understand it, 25M pieces
of metadata, not 25M books, songs, movies, and treasures from the past.

~~~
gt_
It's something along those lines. It looks like the music is all "cover songs"
:( one of the heavier metadata types.

------
wordupmaking
What's the copyright? Would it be legal to unzip those and serve them
directly, so archive.org or anyone else can make them more inviting for
access?

I know you shouldn't look a gift horse in the mouth but there's not even an
index or a rough idea what something like "Name Authorities" might mean.
That's not what I call wide open doors, that more seems like doing some
legally required minimum.

~~~
pmoriarty
I wonder how much more extensive the release could have been were copyright
laws not in the way.

Then there's the old question of whether the works under copyright today will
ever go in to the public domain, or if their copyright will be extended
forever by future changes in copyright law.

~~~
wonderous
Release is 25 million bibliographic index files and has nothing to do with
copyright, since none of the data was ever covered by copyright protection.

~~~
pmoriarty
They didn't have to limit their release just to bibliographic index files. If
they wanted to, they could have released manuscripts, letters, newsletters,
videos, or any other media they have. But they may have felt inhibited by
copyright laws.

So my question is, had copyright laws not been an issue, how much more would
they have released?

There is also the larger question of whether the value of copyright law
outweighs the value of not having it, so that everyone can benefit from this
treasure trove of knowledge.

~~~
AndrewUnmuted
I don't think these records meet the standard definition of 'media' anyway.
This is really just data that can be used for cataloguing purposes and other
media custodian/librarian applications.

Given that the LoC has made it their goal to archive at least one copy of
everything, I think they are not quite the right people to fall into your
anti-copyright cross hairs. However, I do strongly agree with your overall
premises.

~~~
gjjrfcbugxbhf
I don't think the op has anything against the LoC. More like they are
lamenting that the LoC has its hands tied.

------
alphonsegaston
For those who haven't ever worked with MARC record data before, there's a
python library that's a pretty easy interface called pymarc:

[https://github.com/edsu/pymarc](https://github.com/edsu/pymarc)

------
Mathnerd314
Can someone change the title to match the article? s/items/records/

~~~
dogruck
Another nit -- paid for via taxes.

