
Do I really have to cite an arXiv paper? - aaronyy
http://approximatelycorrect.com/2017/08/01/do-i-have-to-cite-arxiv-paper/
======
CogitoCogito
I am astounded that people would question citing an arxiv paper. If you get an
idea from somewhere else, you cite it. This doesn't even have to do with the
archive. I have on multiple occasions read papers where the author wrote a
proof and then cited "private communications" with another mathematician as
the source. In a business where ideas are ideas, you should always cite the
source. Anything else is extremely dishonest.

edit: To be clear, I am personally entirely okay with not citing papers
containing ideas you were unaware of during your own formulation (though I
think if you become aware of it, you should probably point out that it was
previously independently discovered by someone else). Your paper may still
have merit even if it's idea isn't "new" (especially if the first paper is
shit as is often the case).

edit2: I personally don't see much wrong with Yoav Goldberg's blog post linked
in this blog post. It's refreshing to hear his honest opinions out loud. As a
graduate student I always lost a little sanity each time I read a paper with
"great ideas", but terrible follow-through (i.e. explanation and proof of
those ideas). I personally think that clarity of exposition is at least as (if
not more) important than the novelty of an idea. However, you should still
cite the sources of your ideas. Feel free to point out the source's flaws, but
cite them nonetheless.

(Hopefully no more edits...)

~~~
jhbadger
At least in biology, a lot of journals have been cracking down on
"personal/private communication" citations lately -- the point of a citation
isn't just to give credit, but to tell the reader where to find the info, and
chatting with your friend isn't a realistic option for that. If it is
unpublished, and is important enough to be cited, then you should make the
friend a co-author and include the info in your paper. That is a different
case than an arxiv paper, though.

~~~
ajarmst
I'd argue that the latter purpose---allowing the reader to recover the
original sources and arguments to your thinking---is more important than the
'giving credit' one. If a conversation or email exchange with someone was
indeed critical to the development of your ideas, then you offer them co-
authorship, not a "citation' to something not only unpublished but
undocumented. If it was only incidental to the thinking, then no citation is
necessary.

~~~
ajarmst
I will note that many a graduate supervisor or lab administrator has been
offered (or simply taken) co-authorship for a far more scant contribution than
a key conversation.

------
docdeek
Agree with this: if you take an idea from somewhere, cite it. There’s no cost
to you personally or professionally for doing so, and you’re giving credit
where it is due.

>A large number of seminal works have never been published. The greatest
mathematics paper of our lifetimes remains unpublished.

Is this a reference to a specific paper?

~~~
zackchase
Author here: Grisha Perelman's proof of the Poincaré conjecture has never been
(by him, to my knowledge) submitted to or published in any journal. He
decided, as is his right, that he could not care less about the professional
community of publishing mathematicians or their protocols. Does not invalidate
his achievement.

I should note, since I am not and do not expect to be the level of
mathematician that Perelman is, I have not actually read his proof. So I defer
to other superior mathematicians for this assessment and come by it as
hearsay. :)

~~~
docdeek
Thanks - I remember reading about Perelman before but I didn’t realise the
proof was unpublished.

~~~
acd10j
I think the logic that his work is unpublished itself is wrong, he
published/posted his work for world to see in arXiv , He has very strong
opinions about current scientific publishing status quo, hence did not go
through usual route of submitting in journal, He did published/posted on
arXiv. Who are we to decide that posting your work in Blog post , arXiv etc
does not constitute published.

~~~
qznc
It becomes a problem if the people with the money want it. Grant proposals
usually require you to list your important and relevant "published" papers.

There is actually two aspects about "published". One is archival, so people
can expect to access the work decades later (if they have to pay for that is
another discussion). The second aspect is peer-review aka quality control.

Personally, I once submitted a paper to a workshop. After submission, peer-
review, and acceptance the workshop committee decided that they will not
publish proceedings. I could have submitted the paper elsewhere, which I find
weird. Instead I published it as a techreport. However, it is now unusable for
proposals, because a techreport is "not published" even if it is properly
archived and went through peer review.

~~~
matwood
> However, it is now unusable for proposals, because a techreport is "not
> published" even if it is properly archived and went through peer review.

IMO, the definition of 'published' is a huge issue. I have always read
published as archived peer reviewed research. Your paper is both, but remains
in an not published state which I think is wrong and hinders future research.

------
a_e_k
I tend to view arXiv as mainly an aggregated repository of documents that
would ordinarily be tech reports. It is accepted practice to cite tech reports
when the paper author is aware of them and they are relevant. I don't really
see how an arXiv paper should be any different.

------
al2o3cr
Seems like this is a case of "whatever you measure will be gamed": counting
citations is an important part of how academics are evaluated at work, so we
get flag-planting behavior to maximize that metric with minimal effort.

There's a similar issue in journal publishing: counting "published works"
without regard for where leads to journals that will publish literally
anything for cash.

------
eveningcoffee
>“Do I really have to cite arXiv papers?”, they whine. “Come on, they’re not
even published!,” they exclaim.

Is this really a thing?

I understood that today you even city web pages (with a time stamp) and even
your dog.

~~~
glup
Yes, this is a thing. The existence of a citation format isn't a blanket
endorsement of its use in all ways in a scientific work. For example, in
linguistics I could cite a specific usage example from a (non peer-reviewed)
webpage. On the other hand I wouldn't want to cite a (non-peer reviewed)
webpage that outlines a specific linguistic theory.

------
sytelus
The main problem is giant publication latency. Typically there is at least 6
months delay by the time you submit paper, it gets reviewed and then actually
published with proper DOI etc. These days 6 months is loooong time.

I wish there was some way to generate all relevant bib information as soon as
paper gets accepted which then can be added on arxiv immediately. This would
allow folks to distinguish between peer reviewed papers vs those which are
submitted only for flag planting.

------
mindcrime
I don't see how this is even a question. A paper "published" to arXiv _is_
published, in the more general sense of the word. Just because it isn't
"journal published" doesn't change anything.

~~~
glup
Publishing in a (reputable) journal generally means some degree of peer
review. While noisy, this process generally means that really outrageous
methodological errors or theoretical claims get weeded out. For a good paper,
it means that other researchers have pressed them on specific aspects of the
work, which often produces stronger work (new methods, better baselines,
clearer argumentation, clearer math).

I adore arXiv but still believe it's a preprint. In my field (cognitive
science) it would be great if we had more methods to sidestep Elsevier and the
other commercial publishers and have an open stack with rigorous peer review
(PLoS being the main way currently).

~~~
mindcrime
_While noisy, this process generally means that really outrageous
methodological errors or theoretical claims get weeded out. For a good paper,
it means that other researchers have pressed them on specific aspects of the
work, which often produces stronger work (new methods, better baselines,
clearer argumentation, clearer math)._

I see that as all true, but irrelevant in this context. If you source material
from a pre-print on arXiv, then you should cite it. Seems totally obvious to
me. Of course you would _prefer_ the final, published paper if it's available.
But that wasn't the question at hand.

And even with all that said... I would argue that in some fields, (cs / ml /
etc.) we're getting close to a point where arXiv itself is become almost a
parallel publishing mechanism where people cite/publish completely within the
arXiv realm, with less regard for "traditional" journals and what-not in
general. Especially when you factor in papers from researchers who come from
industry, as opposed to academia, and care less about some of the normal
trappings of academic publishing.

 _I adore arXiv but still believe it 's a preprint._

Of course it's a pre-print. I didn't contend otherwise. I'm just saying that,
from my perspective, it's obvious that you should cite a pre-print if it's
relevant.

I will allow though, that norms probably vary from field to field, and as a
non-academic, my take is likely different from, say, somebody who is deeply
immersed in academia, pursuing tenure, etc.

------
pishpash
If you were writing a blog post without stinking academic credentialism
concerns, would you cite it? If yes, then cite it.

------
vmarsy
It's a bit tougher when a paper hasn't been peer reviewed, although not too
different from a good paper published in a low quality or unknown conference.
I think you should cite any idea you pick from a paper when the said paper has
some the following qualities:

\- novelty

\- technical correctness

\- clarity

\- good experimental evaluation

Novelty is the most important: if without the citation your paper looks like
the original idea, this is plagiarism.

If the paper has big shortcomings that your paper adresses, it is fair to give
yourself the credit you deserve of course, but it doesn't harm to cite the
other paper, in fact it gives a way to give some sort of peer review: _In [1],
Foo and al. attempted to explore <subject> but the experiments were
inconclusive/the technique sucked compare to state of the art/they didn't
explain how they did it... In this paper we did this and that and it gives us
awesome results_ (Said in a nicer way)

~~~
taneq
> I think you should cite any idea you pick from a paper when the said paper
> has some the following qualities: [...]

I think you should cite any idea you pick from a paper, full stop. That's the
whole point of citing: To assign credit where credit is due.

------
chriskanan
"If you built on it, cite it", is necessary but does not suffice. Most papers
have a related work section that describes all work similar to your paper. In
hot areas, such as deep learning, some of the related work may have been done
in parallel with your work, so it did not inform it. If this related work is
uncited on arxiv, you find it when you are about done, and it had no influence
on your work, do you cite? Reviewers sometimes demand this.

I've been told a rule of thumb is that if a related unrefereed arxiv paper has
been cited six or more times, with the justification being that this means it
is somewhat well known once it has some citations.

~~~
greeneggs
Of course you cite it. You want to help the reader find the related work. It
doesn't matter whether it "had influence" on your work (a fuzzy criterion). If
you aren't happy that you are doing the same work as others, then find a more
original problem.

It definitely does not matter how many citations the other paper has. The
point isn't to avoid getting caught, it is to inform the reader. Your citation
is more useful the less well known the cited paper is.

------
SeanMacConMara
How is "flag-planting" even relevant to this ? You only cite what you use and
you should not be using vapid hollow flag-planting sources from _anywhere_.
Someone needs a fresher course on academia 101. Is it the reviewers ?

~~~
mattkrause
The issue is that some people (appear to be) submitting very preliminary, and
arguably low quality work to arXiv to stake a claim to some area. Once that
work is "out there", people who were making a more serious effort to do things
more carefully are obligated to cite the original work, presumably as part of
the related work/background/etc.

I can see how this would be maddening, particularly if you started before the
flag-planting paper was even written.

------
anjc
Confused by this part:

> Yes, of course. Any time that our work follows, copies, or borrows ideas
> from other people, and when we can reasonably be expected to be aware of
> this, we ought to cite the related work.

> We should not have to cite nonsense. Many reviewers are abusing the system
> and asking for ridiculous comparison to recently-posted preprint papers.
> Bald-faced flag-planting should not be rewarded. And we should not be
> faulted by reviewers for failing to compare against 2-week old algorithms
> that may or may not work.

So what position is the author advocating? Citing or not citing?

 _edited_

~~~
detaro
Include the next sentence for the first quote: _Yes, of course. Any time that
our work follows, copies, or borrows ideas from other people, and when we can
reasonably be expected to be aware of this, we ought to cite the related
work._

Something being on arXiv or in a blog post or … doesn't excuse not citing it
if it influenced your work. It's important to document where your ideas and
data come from, both to give credit to the author and to allow others to
evaluate what you base your claims on for themselves.

The second quote is about stuff that didn't influence your work. While you are
expected to keep up with and document related developments, forcing authors to
constantly update references to new, not yet properly evaluated work just
because it makes some related claim doesn't make sense.

~~~
anjc
> The second quote is about stuff that didn't influence your work. While you
> are expected to keep up with and document related developments, forcing
> authors to constantly update references to new, not yet properly evaluated
> work just because it makes some related claim doesn't make sense.

It doesn't say that though, it just says that you shouldn't have to cite
nonsense on arXiv. This implies that you can read a paper, implement something
similar, and afterwards decide that the paper was nonsense, didn't influence
your work, and shouldn't be cited.

~~~
detaro
> _Many reviewers are abusing the system and asking for ridiculous comparison
> to recently-posted preprint papers. Bald-faced flag-planting should not be
> rewarded. And we should not be faulted by reviewers for failing to compare
> against 2-week old algorithms that may or may not work._

The context seems pretty clear to me. Your example clearly is covered by the
first case: if it influenced your work, you cite it, even if it is "nonsense".
You can't just "decide" something didn't have influence if it had. (These
rules do not prevent cheating, they are guides for people acting ethically)

The second rule is to prevent the opposite case: You shouldn't be forced to
create the impression your work is based on or just a mere repeat of someone
else's "who had the idea first" when they have no good claim to that, or
inferior to something that hasn't been shown to be actually better.

~~~
anjc
I don't get the distinction. Nobody expects you to cite something which you
didn't read, or which didn't influence your work.

The question here is whether you should cite something which you did read, and
does relate to your work, even if it's a shitty flag-planting paper.

If you think it's shitty, then you can cite and dismiss it in a sentence. You
can dismiss 30 papers in a single sentence if you like. There's no requirement
to wax lyrical for 3 paragraphs about a paper just because it was first. But
it strikes me as dishonest to advocate that sole researchers become arbiters
of a paper's merit, citing or not citing it at their personal discretion.

Also, if their paper was published first then that's the only claim necessary
to demonstrate that they were first out with the idea.

~~~
detaro
The point is exactly that it happens that reviewers demand comparisons with
other work you haven't read yet. And while missing an established publication
that influences your findings is a fault on your part and totally fair
critique, "missing" something that didn't exist when your work happened is
obviously not something you can control.

> _Also, if their paper was published first then that 's the only claim
> necessary to demonstrate that they were first out with the idea._

To quote myself: _impression your work is based on or just a mere repeat of
someone else 's_, not just being first. Ideally, everyone looking at you
referencing it would take note that it was published months after you started
work and your work was independent (or even earlier), but that easily gets
lost.

Should you be encouraged to throw out every idea and snippet to arXiv just so
you can claim "FIRST!" in case it turns out to be useful/true, over
"competing" works that spent more effort on quality and verification and are
now in peer-review forced to reference you as the pioneer (even if you maybe
had the idea months later, but rushed it out and got lucky with it holding
up)? That's what the "flagplanting" is about.

------
nthcolumn
I must be missing some nuance of the argument here. If it is nonsense then why
is it in his paper? If you are citing poor quality sources then that tells us
something about your paper and to not do so would be dishonest. We're not
seriously saying that you should be trawling for similar ideas to your own and
then citing them, just where you have used other work you must give credit.

------
sgt101
There is a massive disjunct between citation in the humanities and citation in
science. People seem to have forgotten this. Ideas are two a penny, they
literally do not matter _at all_ they should not be cited. The person who
introduces a concept to science deserves no credit whatsoever. What deserves
credit is the provision of evidence or proof.

~~~
lou1306
What? Leibniz/Newton introduced the concept of calculus; shouldn't we credit
them with this (amazing) concept just because it wasn't formalized very well
until Weierstrass came along?

~~~
analog31
In my view, Leibniz / Newton did enough heavy lifting to deserve credit for
developing the field. This is way different than just blurting out a "concept"
without even knowing if it's workable.

I'm in agreement that ideas are a dime a dozen. Sure, it's necessary to cite
the first known mention of an idea, and there are situations where the first
mention of an idea is important such as in the patent system.

"Planting" happens in my world all the time. Unfortunately, managers give a
lot more importance to "ideas" than they are really worth, because they over-
value their own interventions in general. Somebody will blurt out an idea in a
meeting, wait until someone else has developed it, and then rush in to take
credit. If a manager does this, it's a blow to morale. I have my own rule of
thumb, which is "show your work" from math class. Just writing down the answer
doesn't get you full credit.

A couple of historical examples: The ancient Greeks are credited with the
atomic theory, but they had no concept of even turning it into a serious
hypothesis. Lots of ideas are anticipated in science fiction, but do those
authors really deserve credit?

~~~
lou1306
> Somebody will blurt out an idea in a meeting, wait until someone else has
> developed it, and then rush in to take credit.

I see what you mean, but this is mostly office politics, which is not very
related to the academic citation process. If an author makes a conjecture that
stimulates further work (even if it's just on the ArXiv), he deserves to be
cited.

> Lots of ideas are anticipated in science fiction, but do those authors
> really deserve credit?

Well, I guess it depends on how much the idea is fleshed out. Lucian was
probably the first author to conceive space travel, but it was just "people
land on the moon" (still pretty far out for his times, though). Meanwhile,
Asimov's three laws of robotics depict a reasonable control scheme for e.g. an
autonomous vehicle, so if they are somehow implemented their author should be
credited.

~~~
analog31
Agreed, definitely cite the author.

In my view, using citations as a metric, and abusing that metric, is just an
academic version of office politics, writ large.

------
erik998
You definitely should cite... What if they are wrong... You can always blame
the citation...

Polywater is the perfect example of why you should citate. Just because
someone publishes something does not mean its correct, scientific, or proven.

[https://en.wikipedia.org/wiki/Polywater](https://en.wikipedia.org/wiki/Polywater)

[http://science.sciencemag.org/content/167/3926/1715?sid=8b4e...](http://science.sciencemag.org/content/167/3926/1715?sid=8b4eadf1-7198-4b31-b0fe-e0b27d28b8cf)

Science takes time.

Write a good paper and cite, cite, cite! If your citations are ever
demonstrated to be incorrect or fraudulent other researchers can continue work
to disprove the citations and work on correcting the errors.

------
dekhn
In the future, the question will be "Do I really have to cite <paper in a
closed journal>"?

------
marchenko
Good article. We want to incentivize researchers to share useful ideas in a
timely manner, and allow others to trace the genealogy of these ideas to
stimulate their thinking and avoid dead ends. A healthy citation culture helps
to build the shoulders of giants.

------
avmich
What if somebody independently came to similar or worse results and only then
read about the research - with possibly more results - made before that?

Given the amount of information available, it could often be the case of
independent research into something which is known and available for some
time. If one only learns about similar - and possibly greater - results after
making one's own, and wants to talk about the work done - should one cite
other, possibly earlier, works?

------
lotsoflumens
Citing an arXiv paper should take precedence over citing anything else.

If you really believe in science then ONLY non-paywall papers should be cited.

------
vermilingua
Please use permanent links, this directs to a tag, and will be invalid as soon
as a newer _publishing_ article is posted.

[http://approximatelycorrect.com/2017/08/01/do-i-have-to-
cite...](http://approximatelycorrect.com/2017/08/01/do-i-have-to-cite-arxiv-
paper/)

~~~
devrandomguy
Please set up a redirect to the canonical URL. This is helpful to third party
systems that interact with your site, such as search engines and content
sharing sites. It also eliminates the issue you raised.

~~~
mintplant
Redirect what to the canonical URL? The entire tag? Why, just to fix this HN
submission? The proper solution here is to shoot an email to
hn@ycombinator.com (which I've done).

~~~
devrandomguy
Redirect the tag to the particular article that it currently references. Only
show users the "correct" url that you want them to use in the future (history,
bookmarks, sharing, etc).

[https://support.google.com/webmasters/answer/139066?hl=en](https://support.google.com/webmasters/answer/139066?hl=en)

~~~
arjie
Bruh, the tag is a page containing a list of articles that are tagged with
that tag. Redirecting to the latest post with the tag would defeat the entire
point of having that page.

~~~
devrandomguy
The original link took me straight to the article, not to a list of articles
with that tag. It was a single article, with the URL of a tag.

Anyway, I didn't mean to argue. Have a nice evening :)

------
kutkloon7
My masters thesis is based on a single arXiv paper. I didn't know that arXiv
had this questionable reputation, but it sure explains a lot.

The project is to make an FPGA implementation of a technique presented in an
arXiv paper. The paper had some big gaps, so I had to spend a lot of time
researching the technique, and I had very little time to spend on the actual
implementation.

~~~
skgoa
arXiv is just a repository for papers to make them available before they are
published/peer-reviewed. Just because a paper is on arXiv does not make it
bad, but it doesn't mean that it is high quality either. Typically the people
in the field will know which papers are important.

~~~
kutkloon7
It's not bad, there are just some gaps. I expect a follow-up paper which
explains everything thoroughly. I feel like my masters thesis should have
started after that paper.

------
lmm
There are two kinds of "cite" here. Citing a non-academic source is different
from citing a (published) paper; you should cite anything that precedes your
work in the second sense, whereas the first is only obligatory for work that
you actually took something from. If you work on calculus you're obliged to
cite Leibniz even if you didn't read him, but you are _not_ obliged to cite
Newton's unpublished work unless you read it. Unpublihed arXiv papers fall in
the latter category.

~~~
nl
_Unpublihed arXiv papers fall in the latter category._

In CS that almost certainly isn't true. I'm most familiar with the NLP field,
but there, if you have some kind of embedding of your
words/tokens/sentences/something you cite
[https://arxiv.org/pdf/1301.3781.pdf](https://arxiv.org/pdf/1301.3781.pdf)
(Word2Vec, Mikolov).

That paper says there is a follow up paper published at NIPS2013, but I don't
think I've ever seen that published.

The field just moves too fast to wait for conferences anymore.

