But that nobody knows how it should be cited is just plain wrong. They should maybe contact their research library and get a introduction to citing. All citation styles I know of can handle web citations. Anonymous authors is nothing new either. And you have things like the internet archive or Webcite to make sure the web document doesn't disappear. Else it could be cited as personal communication, which usually covers direct communication, but can also be used for nonarchived discussion groups.
The reluctance to cite a source because it's not a peer reviewed research paper, is bordering on cargo cult science. As if going through the motions of scientific publishing should somehow elevate arguments to truth.
I have seen this in computational linguistics where people frequently don't cite their data sources, which often come from outside of the field or outside academia altogether. Instead they put a link to a webpage in a footnote, which does nothing for assigning credit.
Reviewers don't take anyone to task for it, because they do that too.
For a while, there was a collection of multilingual word counts over OpenSubtitles that everyone was using, and you would acquire it from the OneDrive of the pseudonymous blogger "Hermit Dave". Many people put the link to his blog in a footnote, but I think I was the only person to come up with a citation for (Dave 2011).
People don't even cite Google Books Ngrams and that's got a real paper .
 Lin, Y., Michel, J.-B., Aiden, E. L., Orwant, J., Brockman, W., and Petrov, S. (2012). Syntactic annotations for the Google Books Ngram Corpus. Proceedings of the ACL 2012 system demonstrations, 169-174.
You can't really put ArXiV and "traditionally prestigious sources" in the same sentence. Any crackpot can (and does) post on ArXiV.
The reason why I'd be more comfortable citing a paper on ArXiV than a random web page (as a mathematician):
1. ArXiV is the archive for papers. You put something there, it's not going away. You can't even remove your own paper once you make it public there. It's a reliable place I can refer people to.
Personal websites? Well, my own very reliable academic web page has been removed by sysadmins without notice after graduation because "we don't maintain accounts".
2. Unrelated content. ArXiV won't have it - unlike, say, the 4chan thread in question. Do I really want to explain to people that I wasn't citing getting "14 * 14! derps out of the way"?. Not to mention ads, memes and NSFW content - which is also there
The other reasons are less important, but are still valid:
3. Presentation. Something on the ArXiV probably is formatted and structured like a readable paper; there are baseline expectations. Reading someone's scratchwork is borderline painful, and a proof on a webpage is usually just that.
Compare  to a rewrite - which one is easier to understand at a glance?
4. Reputation. This is different from "traditional prestige". The reputation of ArXiV is that most of the people who do serious work put it there. Most of the people produce nonsense (possibly including yours truly) also put it there, but the probability of any work having been done by someone who is willing to put effort into it goes down dramatically if the work is not on ArXiV.
5. Signaling. Since it's so easy to post there, any work not ArXiV is signaling "I don't care about my work being available to other people in the field". Why one read something that the author doesn't want them to? If they did, they'd put it on the ArXiV.
6. Attribution. A result on ArXiV makes it easy for me to attribute the result to someone. A thread on 4chan? Do I attribute it to everyone who participated? A single anonymous author?
In short: people give credit to blackboard scribbles and napkin proofs that came out of discussions at conferences, but you won't see "4th beer coaster, such-and-such conference banquet, 2018" as something that people cite.
A proof on 4chan is somewhere in the same category.
The disclaimer here is that this applies mostly to new results in mathematics. Old results are not necessarily available online, and often we don't have a choice. Practices in other fields might be different.
What kind of work was that?
Why would a footnote be considered less credit than something in the list of references? It's pretty common in my field to use footnotes to give credit to other researchers for ideas that arose in an informal context, e.g., "I thank A. Zee for pointing this out". Indeed, the question of whether prose footnotes are at the bottom of the page rather than displayed intermixed with citations to published sources (as in Physical Review Letters) is just a stylistic choices that varies by journal.
Obviously unpublished sources can't easily be counted into automated citation tracking systems for technical reasons, but if the author wanted that they could publish something (or, e.g., put it in a free repository like the arXiv).
I added the Google Books Ngrams example slightly later, but: People _do_ publish papers about their data, and their citation counts are often extremely low, because research users continue to use footnotes instead.
In the OP case, the 4chan author was anonymous.
> People _do_ publish papers about their data, and their citation counts are often extremely low, because research users continue to use footnotes instead.
Are you talking about researchers not giving credit for pure data sets? That is a really very different topic.
Let's continue with the Google Ngrams example. All I had to do was search Google Scholar for "Google Ngrams" to find easy examples on the first page of results, like this one: http://www.aclweb.org/anthology/W14-1611
The paper is written by famous Stanford researchers, and they do not cite a single author of their data. Just URLs in footnotes.
This also happens to my research (ConceptNet) all the time. The fact that you haven't noticed it doesn't mean I'm making it up.
I don't see why you think researchers not giving credit for data is a different topic from what I brought up. What I said was, researchers in my field are in the habit of not giving credit for data, and instead they have adopted the convention of using half-assed footnotes with URLs in them, which is not appropriate credit. I speculate that one reason for this habit is that data often comes from inconveniently non-academic sources and would look bad in the bibliography. The habit is so pervasive that authors remain in the habit even when a paper exists.
For what it's worth, I agree with your criticism.
If they didn't, but they do have a web page on the thing, then give them credits in the currency of the web: links.
GP has a point: if the creators want scientific credit, they can easily get that via ArXiv (or, of course, a peer-reviewed publication). If not: increase their PageRank.
Is your field Quantum Field Theory?
Such is academia, I guess. It’s more important to get “prestige” than to properly cite someone else’s work-where it was published should not matter.
Unlikely. I can't speak for everybody, but I can speak for myself, and I would be extremely hesitant to cite a random webpage in a journal article for several reasons:
1) It can change. ArXiv has versioning, but even so, this is still a problem. You cite something, and months later you go back and find the thing you cited says something slightly different.
1b) It can disappear. Again, less of an issue for ArXiv, but a big problem for any other website. Linkrot is real.
2) There's probably a better source. Maybe in this example a 4chan thread is the original provenance of an idea, but 99 times out of 100, there's a better, original source for the idea.
3) There's no peer review. Peer-review is not a pre-requisite for publication, but if you don't have it, it just makes people more skeptical of your results.
4) It's perceived as lazy. This is closest to your point (i.e. prestige), but not the same. Like it or not, if you cite informal sources in a paper, people start to wonder about the rigor of everything else you do. It's an intangible, but when you're trying to get papers accepted, intangibles count.
Basically, citing a webpage is risky. You avoid it if you can.
Archive.org is not a solution to this problem. Nobody is going to go fishing in archive.org for a citation that has disappeared. Even if you have the URL, what is the correct date to use?
There are just practical problems with link citations, and you don't need to invent malevolent intent to understand why researchers are reluctant to use them.
What makes you say this? Citing anybody does little to increase your personal prestige. I think the issue has more to do with 4chan not being peer-reviewed, and the proof being so recent that few papers have yet been written.
On the contrary, citing a hot new thing is what academics do best.
The first paper I looked at for an example (at the top of today's "Recent" papers in PRD), cites three arXiv entries, out of 91 references. https://journals.aps.org/prd/references/10.1103/PhysRevD.98....
Successful peer review in an upstanding journal is a signal of relative reliability and importance, but it is not the arbiter of usefulness.
> "And you have things like the internet archive or Webcite to make sure the web document doesn't disappear."
And then you just webarchive archive the archive sites instead of 4chan directly :-)
There are many examples available of proofs which have been initially accepted but later discredited in the mathematical community. There are also many examples of proofs which have been polarizing from their first publication. Consider the controversy surrounding Mochizuki and his Inter-Universal Teichmuller Theory. Also consider that there have been credible (though ultimately incorrect) proofs published by credible people for several famous problems, such as P = NP. The academic community at large can't just eyeball a proof and know it to be true, and often even fundamentally flawed proofs require more than a week of careful attention to be declared incorrect.
If you want to argue that peer review is an imperfect system then that's one thing. But to say that a proof is demonstrably true or it isn't ignores the entire complexity of actually figuring out whether or not it's true. Some people can rely upon their own ability to do this, but for the most part the mathematical community is so heavily specialized that even most mathematicians cannot find errors in advanced proofs unless it's their particular subfield of research.
The bulk of your rebuttal is exactly my own point. Peer review doesn't actually prove anything one way or the other. The ultimate truthfulness of a cited proof is all that matters. The idea that cited proofs should be peer reviewed don't add intrinsic validity, it just fosters academic laziness.
"Boy that's a fantastical claim that doesn't seem right; but all the citations are peer reviewed and I see nothing obviously wrong so I guess its true!" -OR- "Boy That's a fantastical claim that doesn't seem right; I don't see anything obviously wrong here so lets double check the cited proofs they based this on."
What's the practical upside to only using peer-reviewed proofs besides speed at the cost of fidelity? What's the practical downside to using unreviewed proofs besides "more work"?
"The Melancholy of Haruhi Suzumiya" is a 14-episode anime adaptation which aired in 2006. Unusually, it broadcast its episodes not in chronological order, but in a mixed order, but for good theatrical reason; the series comprises a main plot arc, and various self-contained episodes which chronologically occur after it. Rather than have the climax of the series come halfway through, the self-contained episodes are distributed in between the episodes of the main plot arc. It sounds mad, but it actually worked very well.
This idiosyncrasy was explicitly referenced by the episodes themselves: the "next episode" preview at the end of every episode has one of the characters say "The next episode is episode X", then another character says "Wrong! The next episode is episode Y", where X and Y are the numbers according to broadcast and chronological order.
Amusingly the story gets more complicated from here. After the series was unexpectedly popular in the anglosphere, it got licensed for an official release. After the contract was signed and it was too late to change it, the company which licensed it noticed the contract had a stock clause in it requiring the episodes to be released on DVD in chronological order. They were unable to get this changed, whereas the strong fan preference was for the broadcast order. As a partial workaround, I recall they released a special edition with the same episodes on two sets of discs: one in broadcast order and one in chronological order.
But the DVD "chronological order" isn't quite chronological either; one of the episodes is a show-within-a-show, a film made by the characters of the series. This happens chronologically late in the show, but it is the first episode in both the broadcast and DVD order (which makes for an extremely confusing introduction to the series).
So there are at least three different plausible ways to watch the series: broadcast order, chronological order, and DVD order (which is almost like chronological order but not quite). This absurd legacy makes the series rather the obvious choice for discussing permutations of episodes. (My own preference is for yet another permutation: broadcast order, but with the aforementioned first episode moved to before the 12th episode, which relates to it.)
The first version to be correctly. They later added another 14 episodes, but instead of calling it a new season, they mixed in the new episodes between the old episodes and rebroadcasted it as one complete series.
As a special "joke", they also had a timeloop-story spanning multiple episodes, with each episodes telling the same story with slight changes in the story, and individual art-style, meaniung each episodes was produced new not just recycled from the first episode. The "joke" with this story was, that of the 14 new episodes, the timeloop-story filled 8(!) episodes and went under the funny name "Endless Eight". I think someone was fired himself for this.
I personally didn't like it, it just seemed soooooo long. The only redeeming factor (in my opinion) is that Suzumiya Haruhi is outlandish enough for it to be the perfect vessel for such an experiment.
Besides, it's a little harder to skip all eight episodes in the original weekly broadcast not knowing when the loop will end, so viewers are more or less forced into the cycle.
An other good pick would be Firefly, famously aired in complete disorder by Fox: the broadcast order was (numbers referring to DVD/canonical) 2, 3, 6, 7, 8, 4, 5, 9, 10, 14, 1 (which is two parts) with the series cancelled before 11, 12 and 13 were aired.
Not only that, but other broadcasters opted to set up their own wacky broadcast order:
* in Mexico, MundoFOX took the Fox order but replaced 1 by 12
* in SA SABC3 used the Fox order but did broadcast 11, 12 and 13 as well (in that order, after 1)
* in Portugal SIC used the Fox order followed by 13, 11 and 12
* Australia almost followed the intended order but swapped 1 and 2
* NZ followed intended order until 10 then used 14, 13, 11, 12.
Also, someone should inform Greg Egan: http://www.gregegan.net/SCIENCE/Superpermutations/Superpermu...
Scientists often abhor citing from blogs, forums, and Wikipedia, as those publications "are not peer-reviewed by trustworthy professionals".
"I feel like you should be able (forced even!) to cite things that aren't journal articles- even blog posts- if they contributed to your body of work. But citing things that aren't prestigious reduces the perceived prestige of your own, which is a perverse incentive not to cite."
EDIT: I can't seem to figure out how to link to specific tweets, so this is the guy that said it: https://twitter.com/_delta_zero
He has a great point. If the proof is correct, you should cite the raw document. If you have to cite it through a proxy scholar, it feels like a form of gate keeping.
Here's the tweet you referenced: https://twitter.com/_delta_zero/status/1054820799473836032
This only really matters on the receiving end of the citation. I.e., if lots of people are citing your blog or forum post, then it can be a good idea to take your content and put it in arxiv or similar. Just so that the citations "count" for you in the academic setting.
And similarly, it's a bit rude (or at least odd) to cite someone's blog post when there's an accompanying published paper that describes the same work.
But citing informal sources where no formal source exists is not (and afaik never has been) a problem.
Perhaps this is better seen as a substantial defect in the fact that the academic web-of-trust model implemented in publication is based on quality of source (e.g. journal's prestige, or the submitting scholar's prestige) rather than quality of scholarship (e.g. whether a critical mass of researchers, rather than 1-3 anonymous reviewers, find it compelling and 'sign off' on it being good work).
I hope (and suspect) that whatever emerges or is developed to replace the paid journal model of academic trust is designed to handle these situations gracefully.
For wikipedia, you should reference the specific version of the article, which will never change.
For blogs, or wikipedia articles you feel might get deleted, you can reference a specific version from archive.org.
In fact, I fear that the trend to publish supplemental material may lead to another crisis because a lot of that material is not managed and archived well and may disappear without warning. Some papers contain important details like long, involved and non-obvious calculations only in their supplements and are not really reproducible without them.
You see very similar things happen from time to time where there are "folklore" theorems in mathematics that someone eventually writes down 20 years later. Sometimes they're able to give an attribution, and sometimes not. But always they give a formal proof using modern notation and credit the idea with whatever fidelity they can.
> I used to think it was because as a quasiscientific community we highly value peer review but ML people are totally fine with citing papers on arxiv that haven't been accepted anywhere as long as you can replicate the results - I think perceived prestige nails it
Peer reviewed citations are excellent, and unstable citations like Wikipedia are obviously concerning. (At bare minimum, you'd want to link to the page history instead of the page.) But the willingness to cite arXiv content without hesitation definitely suggests that peer review is only part of what's going on there.
Which is sad because most peer-reviewed material are not trustworthy either.
It's worrying that academia is slowly turning into a rigid "appeal to authority" church rather than a space for truth and inquiry.
It provides value, indeed most of my papers have been improved by going through peer review. But it's good to be conscious of what it can and can not do.
One thing that it does do is signal intent. It signals that you are willing to engage with the expert community, and specifically their feedback and criticism.
Finally, there is of course a fallacy in saying "not trustworthy either". If 99% of random blog posts are wrong, but 60% of published papers are wrong, then a random paper is 40 times more likely to be correct than a random blog post and it's perfectly sensible to be hesitant about citing the latter.
Finally, scientists have always cited outside of traditional sources and continue to do so . I see no evidence for your assertion that "appeal to authority" has become more prevalent or important than it used to be. With all the ongoing open science initiatives it's rather the opposite.
 Personal communication with various scientists, 2018
Ideally this is true. At the worst though, you'll end up with a peer reviewer that gets pedantic or one who's a member of a citation cartel  that demands that you cite his/her friends papers unnecessarily as a quid-pro-quo leading to publication. It's difficult to say how common this is in practice since academia is generally terrible about self-reflection, but I observed concerning evidence of this when I was still in science. In some fields at least, there's ample evidence  that peer review provides inadequate QC controls for methods sections, making it literally impossible to replicate results due to entire processing steps being omitted. This is why post-publication services such as PubPeer have become more and more relevant, since it has become obvious as time has elapsed how woefully inadequate the initial screening process is.
So while this definitely does happen (and to different degrees in different fields) there are some checks and balances, too.
I could be naive, but my impression was that there are almost too many degrees of freedom in the peer review process that could allow for this kind of chicanery, and that it would almost be preferable to have criteria for acceptance laid out in advance (i.e., exactly what steps, metrics, and citable articles were necessary) and use those the way an actuary would. In some ways I almost think that practices like preregistration serve as a better first-pass indicator of whether or not research is publishable.
I personally would like more use of pre-registration (there is something about that on the front-page right now. But also I think deanonymzing peer review and publishing the reviews along with the papers could be very interesting. Obvious drawbacks as well of course but I think it has potentially massive upsides.
Seeing as how academia literally started out as an accessory to the various Christian denominations for educating priests, I don't see how you really could characterize it as "a space for truth and inquiry". It's only been relatively recently that "appeal to authority" has given way to more secular/scientific ways. Even within the last couple of years, I would wager that academia is no more authoritarian than it has ever been- tenured professors have almost always had a disproportionate amount of influence over their trainees' futures, etc. The head of my father's department when he was getting his PhD in the 70s actually attempted to pull my father's stipend because of an avant garde project that would today be accepted as normal.
I've seen a trend towards the opposite, actually, with graduate students becoming more skeptical of the institution and more proactively addressing its shortcomings through grad student unions and advocacy of open science policies (which neither university administrations nor longstanding tenured professors are fans of in general).
The papers check results in fields like medicine and psychology which are plagued by the fact that they have to rely on various kind of effects from random sampling of study participants and statistics. I am not surprised that a study with 10 participants leads to wrong results with a significant probability.
But this does not generalize to science in general. Physics and chemistry do not suffer fron this because their experiments have fundamentally different setups. So extrapolating from medicine to "most science is wrong" is just as wrong. You need to approach these fields differently when you want to know how many wrong results are published.
There's probably no great answer here, it just feels like authors and editors sometimes use "it was reputably published once" to shrug off disappearing sources.
a) Tell others were to go if they want more context.
b) Give credit for where you got the idea or for doing the legwork of putting it together.
c) Allow auditing of the ultimate origin of a claim and e.g. identify whether one data point is getting repeated like it's two data points. https://en.wikipedia.org/wiki/Information_cascade
Thinking about what I do. There's a difference between me depending on the results of a survey and hearing a novel take on a market or whatever. In the former case, I'm dependent of the survey methodology being sound (and I'm going to trust some sources far more than others). In the latter case it's more "I never thought of things that way" but I'm going to do my own analysis.
Academia. And even that is error prone due to bias.
1) Even when you succeed you can't start the car.
2) Even given the best case scenario you end up with a Ford.
> Someone this year made a bot that did it in about 5 minutes.
It's also the case that it's plausible one can interrupt & resume the sequence (at the cost of 4 extra button presses per interruption).
I'd probably just do it for the novelty (who else can say they have a more unusual source??)
Reference: personal communication
another great example is Bitcoin paper.
Yes. Not because they need/want the credit, but because it's important to note that it's not your idea.
> If the concern is that people will think the person writing solved it, they can just write a sentence where it came from.
That is, basically, what a citation is.
* personal correspondence
* facebook profile pages
and my personal highlight
* google search queries (I kid you not)
All of these citations actually occured in scientific journals. I assume Nicolas Bourbaki was also cited several times, without anyone caring about "academically correct" citation rules.
-Rob’s Facebook wall
-My friend in gchat
If this isn't credible enough for most academics, then I think they have their priorities wrong. I want to cite the best work, not just work that looks good superficially. I assume that the claim that this is the best known lower bound is true. In that case, if I am writing a relevant paper, it's my duty to cite the result regardless of where it came from as far as I am concerned.
Edit: Same sentiment in this post a few minutes before mine: https://news.ycombinator.com/item?id=18292438
You have a `n`-episode show, and you want to watch every episode in every possible order. However, if one series-watch-through ends with episode `x` and the next series-watch-through in sequence begins with that same episode `x`, you don't have to watch episode `x` twice in a row. Given these constraints, how do you find the "ideal" way to watch through the entire series, in all possible combinations, in the shortest number of episodes?
Is it possible to publish a result properly if one is not a mathematician and not affiliated with an institution?
I suppose hacker news is the most recent equivalent.
It looks like 4chan may be the most recent equivalent : )
Also, in the context of this thread, 4chan is an ephemeral site so there is no such thing as link to content.
Standard disclaimer applies about 4chan. If you're thin-skinned, you may find it not worth the trouble. Just don't engage with the trolls and you'll be fine.
It's funny, when a community pretends to be stupid for the funnies, it's not too soon before the actual stupid people move in.
The biggest problem I see on /diy/ is questionable humor, sort of like how blue collar occupations used to haze the apprentices a bit, you'll see the most ridiculous / comical suggestions sometimes.
I noticed that in a paper written by different authors, rather than citing the StackOverflow post when discussing that solution, cited our paper above.
“The following proof was found in the 4chan web site (URL at the end), without any attribution...”
tl;dr: Iterate through each value v and spawn a process that waits v seconds before writing itself as output.
Edit: Earliest HN discussion: https://news.ycombinator.com/item?id=2657277
These should absolutely be cited if you as a researcher are convinced they are correct. There is no good reason not to.
Some undergrad or masters should do some work to validate the work and write a paper about the validation. Then they'll be cited by anyone who wants to use the 4chan thing
Overlap between permutations is why it's not as simple as it first sounds. For example, to borrow the wiki example, "121" includes both "12" and "21" permutations.
* or at least had. I think that a lot of people left the site over the last few years as the site changed quite dramatically.
I think you're right about Nazis coming in, but I think that it was much more contained when Moot was there. I guess maybe that provided the setup for the 2016 elections.
With new users being added in at such a rapid pace it seems like it was impossible to on-board them in terms of generally accepted practice; a eternal summer echo of the "Eternal September" of the usenet days which involves many of the same complaints.
A lot of people who use it simply like the lack of filters or reputation that anonymity provides. A post doesn't come with the baggage of a username that people can hunt through, so it can really only be judged by its content, not by the reputation (good or bad) of its author.
Just because 4chan shows up in the news when its users are stirring up trouble doesn't mean its all a bunch of edgy teens. Even if thats how it began, an edgy teen from 4chan's inception is now old enough to be a part of the adult workforce.
That's a goddman myth. There are many ways to break anonymity (usernames, tripcodes, country flags, user ids) and the very moment people have any ability to track your posts they will dogpile on them. /int/ is (used to? don't go there anymore) the worst example, in that posters that aren't from North America or Northern Europe will get trashed quickly with stupid memes no matter what they say.
Even when it comes to truly anonymous content, this is just not true. If, for instance, you reply to multiple posts and in one of the replies you express the wrong political opinion (on 4chan that means 'not being alt-right') every single person you replied to will trash you and call you a n* or whatever. People don't magically change their way of thinking just because they're on an anonymous platform, they're still full of biases and idiocy. And poison.
On one hand, there's /out/ focusing on outdoor hobby/lifestyle discussions which only has a minor infestation of Mora knife meme. Admittedly I own two Moras and those are excellent outdoor knives so the boundary between meme and reality is fuzzy. Other than that, /out/ is very civilized and contains extremely high quality content.
On the other hand, there's /fit/ which seems to consist entirely of lifting broscience despite being anonymous.
It is possible I'm missing the point because this discrepency reflects actual reality in that there is a useful objective opinion of, for example, proper medical foot care for long distance hiking or burn care, but there really is no one size fits all simple magic recipe for power lifting gains or "the" correct body building diet.
That's the whole thread