
Library-managed 'arXiv' spreads scientific advances rapidly and worldwide - tosh
http://ezramagazine.cornell.edu/FALL12/CoverStorySidebar2.html
======
CJefferson
Can I just make a general plea?

You should upload your paper to arXiv. When you do, please upload your source
(tex, or word I imagine), as well as a PDF.

For the blind, PDF is the worst possible format, and tex and word are the best
formats. Don't hide, or lose, the blind-accessible version of your paper.

~~~
hackuser
> For the blind, PDF is the worst possible format

I'm surprised. Nobody has created an accessibility solution for PDFs after all
these years of ubiquity? What's the story?

~~~
contravariant
Well, I don't know about PDF, but PostScript, which it's based on is basically
just a programming language for drawing symbols in specific positions on a
page. Depending on how it was written (or more likely, generated) this could
be readable, or incredibly unreadable.

As an example, in post script the following example from Wikipedia would
simply show the text Hello World:

    
    
         %!PS  
         /Courier             % name the desired font  
         20 selectfont        % choose the size in points and establish   
                              % the font as the current one  
         72 500 moveto        % position the current point at   
                              % coordinates 72, 500 (the origin is at the 
                              % lower-left corner of the page)  
         (Hello world!) show  % stroke the text in parentheses  
         showpage             % print all on the page
    

Now if you're lucky you can just extract all quoted text and read those in
order, but that's unlikely to work for all documents.

~~~
hackuser
Almost all formats would similarly require content to be parsed from code; for
example, consider HTML, Word, or Excel.

It makes me wonder how screen readers work. Thinking out loud, it seems that
the screen readers should let the applications (e.g., Word) handle their own
parsing and presentation and obtain the data after that. Otherwise, the screen
reader would have to reinvent many wheels, interpreting the code for all
applications including all their versions, features, quirks, and platform
integration issues - such a daunting and difficult task that it seems
unlikely. But where do screen readers hook into the content? After it's output
from the application but before it's an image for the screen (which could
require OCR)? Ironically, I suppose PostScript or PDF could provide common
interfaces.

~~~
contravariant
Sure all formats require some parsing to get to the content. However Tex,
HTML, Word, Excel and languages like that were designed to allow people to
format text and other data into a document. Therefore they generally make text
and other information appear sequentially, and separate display logic from
content.

PostScript and PDF have no such separation, the _are_ the display logic. They
are fully fledged programs that list the position, size, font, colour, of
every symbol on every page, if you're lucky in a vaguely logical order, but
there's no reason it should be. If the files were written by a human you may
have some hope of extracting some of the content, but almost nobody writes PDF
or PostScript by hand any-more.

------
naftaliharris
This article misses one of the biggest value-adds of arXiv, at least in my
field (Statistics): since almost everyone posts to arXiv, you can almost
always find a free version of a published and potentially pay-walled paper. In
the past, publishing in a peer-reviewed journal would (1) improve the paper
through peer review, (2) signal the quality of the paper based on the prestige
of the journal, and (3) distribute the paper. With arXiv, publishing now only
does (1) and (2).

~~~
slacka
> you can almost always find a free version of a published and potentially
> pay-walled paper.

On personal research, I've used it for exactly this, but since what I've seen
was only preprints, I've often wondered about the final version. It looks like
I'm not alone.[1] Do many or any of the arXiv papers get updates with the
improvements that come from peer reviews? Is there a need for arXiv for finals
or do publishers demand exclusives on finals?

[1] [http://mathoverflow.net/questions/41141/should-i-not-cite-
an...](http://mathoverflow.net/questions/41141/should-i-not-cite-an-arxiv-org-
paper)

~~~
beevai142
Publishers (in this subfield at least) usually demand ownership only on the
final typeset manuscript PDFs. Those cannot be uploaded, but people are
usually free to update the arxiv manuscript by uploading their own "final"
version files, with content equivalent to the published one. In the corner
where I come from, I'd say this is done most of the time, especially if there
are major changes. In practice, people often read only the arxiv versions
anyway since publisher's web pages can be crappy.

Also, since you submit manuscripts to most journals in TeX, there's very
little extra work involved in uploading the updated files also to arxiv. You
maybe miss the copy editor's grammar corrections etc., but those are almost
without exception unimportant --- also, more often than not, the copyediting
by the publisher introduces errors not present in the original manuscript.

~~~
kkylin
Agree with everything. I also want to point out that the final published
version is not always better -- it represents compromises made with reviewers
/ editors to get papers through. Often these are positive, but not always.
Sometimes it's useful to be able to send people the preprint rather than the
final version.

------
ivanstegic
Ah, the original [http://xxx.lanl.gov/](http://xxx.lanl.gov/) that I knew and
loved in the 90's, when people thought we were surfing nudies in the Physics
department and not papers on differential geometry. I helped establish and run
the za.arxiv.org mirror at WITS University, mostly to learn how to configure
RedHat, Apache, rsync and other tools. I'm glad it still exists.

~~~
RMarcus
I've worked at Los Alamos for 6 years and I didn't know this existed. Pretty
cool.

------
divbit
ArXiv is incredibly useful for research, but I think people also use it for a
sort of "I posted it to arXiv first, therefore I solved it first" kind of
thing, which imo can be misleading at times, if not everyone follows that.
Also there is the eprint.iacr.org which seems to do the same thing, except for
cryptography (or is it cryptology?), so I'm not sure if every important
preprint in that topic gets to arXiv.

~~~
ajross
> I think people also use it for a sort of "I posted it to arXiv first,
> therefore I solved it first" kind of thing, which imo can be misleading at
> times

True, but I don't see that fights over precedence are unique to ArXiv either,
or even made worse by it, no? I mean, at least now there _is_ an unambiguous
date-stamped public place to cite in this kind of fight. And those fights
provide a built-in incentive to put stuff up there, which is good for all of
us.

Basically: who cares about spitballs as long as the papers end up on ArXiv?
Seems like a cost worth paying to me.

~~~
divbit
> True, but I don't see that fights over precedence are unique to ArXiv
> either, or even made worse by it, no?

I'm not familiar enough with other methods of preprint publishing besides
arXiv / eprint.iacr, but you may be right that it is not unique to arXiv.

My personal preference would to be to have bits of research done through
something like git, so that work along the way can be seen, otherwise one may
solve a problem and then be 'out-arXived' by someone who spends an all nighter
tex-ing your solution (this is a hyperbolic example, but I think the idea of
the potential flaw in the system should be clear).

~~~
Eridrus
Besides disputes involving patents, papers that are withing a short time frame
of each other are usually understood to be cases of parallel invention, it's
happening quite frequently in deep learning atm since there are still a lot of
relatively low hanging ideas, to the extent that people are commenting/joking
that they consider the risk of colliding with someone else when deciding what
to work on.

~~~
divbit
> withing a short time frame of each other are usually understood to be cases
> of parallel invention

I see what you are saying, but I don't think it's that cut and dry, otherwise
I could just take someone else's work from yesterday (or whatever a short time
frame is), and re-solve it (easily -since now the tricky parts have been
revealed) and post it today - tada, I parallel invented it!

~~~
beevai142
Typically the work in a paper, if substantial, is done over a long time, so
even if the main destination ends up being same, it's unlikely the route and
sidestops are the same. So often you can wriggle a little bit and expand the
paper sideways, so that it is still publishable work even if the other work is
given priority.

It's actually not that rare to have similar papers appear in arXiv one or two
weeks later after you submit --- to me, it happened several times within last
few years. In these cases, it is possible to see that the approach differs
enough (and moreover, often you know the people in question, or you know
someone who does).

------
pepon
I hope it is replaced with something better soon. You cannot see access
statistics concerning the papers you upload, and they provide this absurd
reason for not doing it:
[https://arxiv.org/help/faq/statfaq](https://arxiv.org/help/faq/statfaq) (it
seems they think arxiv users are idiots or something, so they have to take
care of us). Also getting the uploaded latex files to be compiled without
errors is a pain, and they don't let you to just upload the pdf (this has pros
or cons, but I wish there was the freedom to choose... and I guess that
99.999% of the time people just download the pdf).

~~~
javajosh
Requiring error-free latex is almost certainly a reasonable proxy for real
curation effort.

~~~
lorenzhs
The issue is that their LaTeX installation is fairly old, so there's a real
chance of running into old bugs that have long since been fixed. It's a bit
tiresome to work around those. I've had issues with their pgfplots version and
had to resort to compiling the figures to pdf locally and including those.

------
tnecniv
Needs a [2012]

------
starshadowx2
Interesting to learn how to pronounce it correctly. I've always just said arx-
iv like it's spelled.

~~~
joeyo
It _is_ pronounced like it's spelled; the X is a chi.

~~~
dkbrk
Is it spelt that way, though?

X is LATIN CAPITAL LETTER X

Χ is GREEK CAPITAL LETTER CHI

Everywhere I have ever seen it spelt, it has been written with the Latin
letter. I have never seen it spelt as "arχiv" or "arΧiv", only ever as
"arXiv".

I can understand why, for example, having a Greek letter in the URL would be
undesirable, but if one is going to consider that letter to be Chi, then that
_should_ be the authoritative spelling and it should actually be spelt that
way where possible.

~~~
robotresearcher
Given the the "Archive" homophone, it is obviously intended to be pronounced
that way. The spelling is secondary to the common pronunciation, as usual in
English.

------
ckdarby
Is there any reason why a project like this wouldn't be open sourced?

Follow up question, how does a site like this have a $500k annual budget? I
was napkin calculating the costs of running this and couldn't get anywhere
close to $500k without having extensive staff salaries.

~~~
cooper12
Looking at it from a Cornell point-of-view, the most innocuous reason I can
think of is that they want a canonical library of papers that others can
mirror rather than researchers having to search each individual university's
arXiv. If they let others fork and set up their own servers it could lead to
interesting modifications/applications but it would no longer be in their
control and might make the preprint locations fragmented. (and the other
servers might not have the same moderating standards)

The other more greedy explanation is always money. Of course open source isn't
antithetical to profit, but as mentioned before you do lose control and maybe
Cornell doesn't want competition. Even if the project was started with the
best of intentions, they still need to make it self-sufficient and maybe even
profitable so they probably decided it's in their best interest. Of course
this is all just me speculating.

------
jessriedel
Here's a recent lengthy FAQ Ginsparg did on the arXiv (ironically behind a
journal paywall).

[http://onlinelibrary.wiley.com/doi/10.15252/embj.201695531/f...](http://onlinelibrary.wiley.com/doi/10.15252/embj.201695531/full)

Here's a discussion on HN of a blog post by me sparked by a conversation with
Ginsparg.

[https://news.ycombinator.com/item?id=9415985](https://news.ycombinator.com/item?id=9415985)

------
beezle
Am I the only one who still uses xxx.lanl.gov ?

~~~
Steuard
Probably not. :) (Is it a redirect now, or is it an actual mirror?)

My understanding is that they switched to the new domain after people noticed
the original was being blocked as porn by a bunch of automatic content
filters.

~~~
beezle
Yes they put the new masthead on instead of the skull n crossbones

------
science404
There's something I've always wondered about.. what do you do if you upload
your journal submission to arxiv but it's later rejected? That possibility has
always been a deterrent to submitting to arxiv for me. Seems to me this
discussion assumes arxiv uploads will be accepted to some journal eventually..

------
deepnotderp
Hooray for arxiv :)

Long live open science!

------
danjoc
>Eleven years ago Ginsparg joined the Cornell faculty, bringing what is now
known as arXiv.org with him. (Pronounce it "archive." The X represents the
Greek letter chi.)

Been pronouncing it "ar ziv" until now. :P

~~~
alanh
But, "archive dot org" is an entirely different and also noteworthy
organization!

~~~
danjoc
I think that's "ark ive" where this is "ar chive". I'm gonna call it ar chive
dot org now and get more stupid looks, aren't I? :/

------
kensai
But I bet not faster than Sci-Hub... har har har. :D

~~~
IanCal
Significantly faster than sci-hub. Sci-hub is, afik, based on published work.
This is preprints, so well in advance of that.

~~~
kensai
Well, however, we can argue here that a pre-print is not of the same quality
as a published work. Not that published works cannot have errors (and be
retracted), but at least they have passed a first-level scrutiny of an
editorial board.

~~~
PaulHoule
It is pretty controversial. The value of peer review has never really been
demonstrated and despite peer review, the median scientific paper is wrong.

~~~
kensai
How do you want this value to be demonstrated? Peer-reviewing is exactly this:
peers evaluating the methodology and results of your experiment or idea. If
this is faulty, imagine how much more faultier is an attempt to pre-print
without even passing that step, which mind you, is pretty important for major
journals such as Nature or Science.

Peer-reviewing has a lot of weak points, but saying that pre-print is the
answer to them all is plainly wrong.

~~~
ci5er
It doesn't seem that anyone is saying that we need to get rid of the
(admittedly valuable and proven) peer-review model of publishing in journals.
Or that this is uniformly better for all use-cases, esp. those that peer-
review excels at.

It's merely a supplement to peer-reviewed journals that has some nice
characteristics, for some use-cases, which has been beneficial, to some
researchers, in some fields.

------
baby
I hate arXiv, I can never figure out where is the PDF, if there is a PDF...
long live eprint.

~~~
lorenzhs
I'm not sure whether you're serious, but on any article's page there a
"Download" section with a link to the PDF (labelled "PDF").

~~~
baby
Not always.

~~~
GFK_of_xmaspast
Can you link three examples.

~~~
greeneggs
Here's an example:
[https://arxiv.org/abs/1611.06999](https://arxiv.org/abs/1611.06999)

I believe the PDF link is only removed after a paper has been withdrawn. But
if you click on v1, you can still access the original paper (incorrect, in
this case).

~~~
lorenzhs
v2 is a 0kB file. As you note, v1 is still available, just as with any other
paper that was updated.

In any case I don't think that withdrawn papers support the OP's claim that
PDF links are hard to find for _normal papers_.

