

In memory of Aaron Swartz: a collection of PDFs from PDFtribute - inconditus
http://edward.io/pdftribute/index.php

======
houshuang
An important thing to remember, is that many journals already permit self-
archiving of publications (ie. uploading a pre-print to a personal server or
an institutional repository). In fact, about 70% of large publishers
automatically allow some form of self-archiving, and for the others, many have
been successful including a copyright addendum with the copyright-transfer
document, retaining some rights (<http://scholars.sciencecommons.org/>). FAQ
on self-archiving (<http://www.eprints.org/openaccess/self-faq/>).

At my university, we keep running workshops, there are student staff in the
library willing to help upload articles to the repository if you just e-mail
them, etc, but still, most academics won't take the five minutes to do this,
even if they have the right.

This doesn't mean that the academic publishing system shouldn't change, it
absolutely should. And there's also a lot of value in "liberating" academic
publications that would otherwise not be free. But I hope people would become
more aware of what is already possible, and legal!

~~~
streptomycin
Agreed, a lot of people don't know their current rights. One major reason is
because that information is typically buried in some unintuitive legalese deep
in some publisher's website. To work around that problem... this is a very
useful database that will allow you to easily check what a journal/publisher
allows you to do with your publications: <http://www.sherpa.ac.uk/romeo/>

Most in my field at least allow you to put postprints (the final version of
the paper, but not formatted by the journal's typesetters) online, although
there are a few stragglers who don't let you do anything.

------
houshuang
As long as these PDFs are exposed publicly (and linked to, which a tweet with
or without #pdftribute will take care of), they will mostly be indexed by
Google Scholar, which does a decent job of extracting metadata using
heuristics etc.

Of course, it would be much better if people started embedding machine-
readable metadata in PDFs (totally possible, see for example
<http://code.google.com/p/pdfmeat/>), and if there was some agreed-upon format
for bibliographic microformats, that could be embedded in websites listing
articles.

We also eventually need an open alternative to Google Scholar. GS is great,
and I use it every day (and love that you can output BibTex for example), but
it has no API (and will never have one because of deals with publishers),
actively resists automatic access, is a black-box in terms of how data is
gathered, etc. Think of "Open Scholar" to Google Scholar as analogous to OSM
vs GMaps. OSM might not look as pretty, or be as consistent in the beginning,
but it enables a whole range of applications that GMaps doesn't. (And at least
GMaps does have a fairly good API, even if it charges for overuse, GS has
nothing).

(These are just some thoughts I've made, as I've been experimenting with an
open scholar workflow, trying to share as much of the "byproduct" of the
research, including rich notes and summaries, my own bibliography with links
to OA pubs where they exist etc: <http://reganmian.net/wiki/researchr:start>).

Another thing I've found working on my project, where I try to expose OA links
to as many pubs as possible, and regularly rescan to see if they are still
available (and still OA), is how quickly documents disappear... Hosting on
private pages is convenient, but fragile. Ideally, people would upload papers
to university repositories, subject repositories like Arxiv.org, etc.

~~~
Vivtek
Thanks for contributing to this thread - I've been looking for something like
this for years!

~~~
houshuang
Let me know if you want to discuss further - I've lot's of ideas, but not able
to make most of it happen by myself. For example, here are a bunch of
(unfinished) notes about an open alternative to Google Scholar, and open
social API to share citations/notes, etc.
<http://reganmian.net/wiki/ideas_for_scrobblr>

------
smogzer
Cool effort.

But ... its a score to jstor. It's unorganized.

But ... science if full of noise and crappy publications these days anyway.
Lots of ways to do the same thing, unprooven and only exists because everybody
has to publish to stay relevant.

Now: How to really improve science ? My suggestion: A big python framework for
each field of study. That has implementations of the real algorithms and
models for comparison and benchmarking and even real life implementation.

See as example in the robotics field, ROS ( Robotics Operating System) . Ros
is like a basis glue framework where universities and individuals can publish
their code. Its decentralized, it has simulators so that scientists do not
need to own the physical robots and can even compare(diff) results and
algorithms in a very fast way.

The simulator can have a embedded browser + wiki + quora that explains X.

evolution: physical paper -> PDF -> simulator.

~~~
jcitme
It's not meant to be a competitor to JSTOR, as much as this is a statement in
honor of someone.

A framework like that would be awesome, but that has a different meaning from
the collection of personal pdf posts/uploads each individual on Twitter
contributed.

~~~
houshuang
For someone who has been working on OA and scholarly publications for a few
years, it's a bit tricky to enter these kinds of debates. On the one hand, I
want to respect Aaron's legacy, and am very touched by the spontaneous and
organized moves to honor him. On the other hand, I am very interested in these
issues, and love discussing them, seeing how we can do things better.

For smogzer, there is some interesting research on how knowledge from the
research literature can be better represented. For example using concept
mapping, see this great paper by Simon Buckingham-Shum (who has many others):
<http://oro.open.ac.uk/6463/1/kmi-04-28.pdf>. Anita de Waard has given many
presentations on semantic and executable papers, for example
[http://www.slideshare.net/anitawaard/executing-the-
research-...](http://www.slideshare.net/anitawaard/executing-the-research-
paper).

------
dutchbrit
Cool stuff, have you seen this yet?

<http://pdftribute.net/>

~~~
jychang
I think this website is different from PDFtribute.net because it actually
collects and stores the PDFs rather than just having links to the Twitter
posts.

From the 'About' section of the website, you can see it uses PDFtribute.net to
help scrape links.

~~~
dutchbrit
I missed that part (the usage of pdftribue.net for scraping), thanks!

------
jychang
Looking through all the files that are uploaded, there are a lot more non-
English documents than I expected; I randomly clicked on 2. It's amazing how
there is support from around the world.

------
zopticity
It is a great loss to know such an entrepreneur has died because of legal
problems. I, myself, have faced similar been in a similar situation. I feel
that Aaron was a martyr for the open source of academic papers. Unfortunately
he will not see his impact on this modern and technology dependent world.

R.I.P. Aaron Swartz!

------
houshuang
Nice short article: 10 things you can do to really support Open Access:
[http://phylogenomics.blogspot.de/2013/01/10-things-you-
can-d...](http://phylogenomics.blogspot.de/2013/01/10-things-you-can-do-to-
really-support.html)

------
wreckimnaked
Nice idea!

Also, some metadata aggregation (title, author, tags, date published)
capabilities wouldn't hurt anyone.

------
fgrt2
in memory of Swartz, 1 million ebooks for free download

<http://ebookoid.com>

