It is a fun little event that something posted on 4chan is used in a research paper.
But that nobody knows how it should be cited is just plain wrong. They should maybe contact their research library and get a introduction to citing. All citation styles I know of can handle web citations. Anonymous authors is nothing new either. And you have things like the internet archive or Webcite[0] to make sure the web document doesn't disappear. Else it could be cited as personal communication, which usually covers direct communication, but can also be used for nonarchived discussion groups.
The reluctance to cite a source because it's not a peer reviewed research paper, is bordering on cargo cult science. As if going through the motions of scientific publishing should somehow elevate arguments to truth.
It's not that nobody knows how to cite it, it's that nobody wants to cite things from outside of academia, perhaps because it lowers their own perceived prestige.
I have seen this in computational linguistics where people frequently don't cite their data sources, which often come from outside of the field or outside academia altogether. Instead they put a link to a webpage in a footnote, which does nothing for assigning credit.
Reviewers don't take anyone to task for it, because they do that too.
For a while, there was a collection of multilingual word counts over OpenSubtitles that everyone was using, and you would acquire it from the OneDrive of the pseudonymous blogger "Hermit Dave". Many people put the link to his blog in a footnote, but I think I was the only person to come up with a citation for (Dave 2011).
People don't even cite Google Books Ngrams and that's got a real paper [1].
[1] Lin, Y., Michel, J.-B., Aiden, E. L., Orwant, J., Brockman, W., and Petrov, S. (2012). Syntactic annotations for the Google Books Ngram Corpus. Proceedings of the ACL 2012 system demonstrations, 169-174.
To give you a counterpoint, I've read plenty of computer science and mathematics papers where someone cited an arXiv paper by an obscure or unknown researcher. I've even seen blog posts and email correspondence cited. In fact, in one of my own papers I cited an unpublished manuscript which you can't even find on the internet - you have to personally contact the author, who hasn't touched the thing in the last 25 years. For all intents and purposes, this manuscript - while valuable - only exists in the tribal folklore of the research community. But it's still cited and most active people in the field probably have a personal copy of it on their hard drive.
I think the difference is that arXiv papers don't have the same perceptions of low prestige attached as a random blog or a 4chan thread does. The issue isn't that the work is (yet) unpublished or that its obscure: its that it is from outside academia or traditionally prestigious sources.
You can't really put ArXiV and "traditionally prestigious sources" in the same sentence. Any crackpot can (and does) post on ArXiV.
The reason why I'd be more comfortable citing a paper on ArXiV than a random web page (as a mathematician):
1. ArXiV is the archive for papers. You put something there, it's not going away. You can't even remove your own paper once you make it public there[1]. It's a reliable place I can refer people to.
Personal websites? Well, my own very reliable academic web page has been removed by sysadmins without notice after graduation because "we don't maintain accounts".
2. Unrelated content. ArXiV won't have it - unlike, say, the 4chan thread in question. Do I really want to explain to people that I wasn't citing getting "14 * 14! derps out of the way"?. Not to mention ads, memes and NSFW content - which is also there[2][3]
----------
The other reasons are less important, but are still valid:
3. Presentation. Something on the ArXiV probably is formatted and structured like a readable paper; there are baseline expectations. Reading someone's scratchwork is borderline painful, and a proof on a webpage is usually just that.
Compare [3] to a rewrite[4] - which one is easier to understand at a glance?
4. Reputation. This is different from "traditional prestige". The reputation of ArXiV is that most of the people who do serious work put it there. Most of the people produce nonsense (possibly including yours truly) also put it there, but the probability of any work having been done by someone who is willing to put effort into it goes down dramatically if the work is not on ArXiV.
5. Signaling. Since it's so easy to post there, any work not ArXiV is signaling "I don't care about my work being available to other people in the field". Why one read something that the author doesn't want them to? If they did, they'd put it on the ArXiV.
6. Attribution. A result on ArXiV makes it easy for me to attribute the result to someone. A thread on 4chan? Do I attribute it to everyone who participated? A single anonymous author?
----------
In short: people give credit to blackboard scribbles and napkin proofs that came out of discussions at conferences, but you won't see "4th beer coaster, such-and-such conference banquet, 2018" as something that people cite.
A proof on 4chan is somewhere in the same category.
----------
The disclaimer here is that this applies mostly to new results in mathematics. Old results are not necessarily available online, and often we don't have a choice. Practices in other fields might be different.
It's up to professors. I've had very important work taken by professors and published by them from my websites without attribution - even my own old professors! They think theft is fine if you aren't a fellow pirate, apparently. Arxiv is used by fellow pirates, so it's off limits, on this view.
That sounds mighty weird, and honestly the statement has certain characteritics that trigger my crackpot alarm (don’t take it personally)... If what you say is true, it’s not like you don’t have recourse — you can write to the publication’s editorial board, or report to your department or institution’s academic disciplinary body (of course, that’s after you have contacted whomever you’re in dispute with; it’s okay, and I’ve had people cold emailing me about my preprint, asking me to mention their only somewhat related and by no means appropriated work...) Also, if you produced whatever result when you were working for a professor, they can publish it under their name (it’s nice to at least have you on the author list, of course), you don’t revoke their right just by publishing it on your own website. Anyway, whining on HN is the about least useful thing you can do.
This is common stuff, and good luck on prompt action! No, I wasn't working for a prof - but note that you yourself just accept that work including ideas can be published by a prof without attribution to the discoverer! That tells us all where the academic mindset is. My noting is your whining, I gather. One can't tarry on life's path to upbraid all the assholes in the world, and legal redress is well out of reach of most of us. One can note it to warn others, and move on.
> Instead they put a link to a webpage in a footnote, which does nothing for assigning credit.
Why would a footnote be considered less credit than something in the list of references? It's pretty common in my field to use footnotes to give credit to other researchers for ideas that arose in an informal context, e.g., "I thank A. Zee for pointing this out". Indeed, the question of whether prose footnotes are at the bottom of the page rather than displayed intermixed with citations to published sources (as in Physical Review Letters) is just a stylistic choices that varies by journal.
Obviously unpublished sources can't easily be counted into automated citation tracking systems for technical reasons, but if the author wanted that they could publish something (or, e.g., put it in a free repository like the arXiv).
Crediting a web page in a footnote is almost no credit at all. The web page didn't compile the data. A person did. Sometimes the person is inconveniently pseudonymous, but other times they should be credited by name.
I added the Google Books Ngrams example slightly later, but: People _do_ publish papers about their data, and their citation counts are often extremely low, because research users continue to use footnotes instead.
Are you talking about someone crediting a webpage without attempting to attach the name of the author? I have never seen this in academic research, ever. Can you link to an example?
In the OP case, the 4chan author was anonymous.
EDIT:
> People _do_ publish papers about their data, and their citation counts are often extremely low, because research users continue to use footnotes instead.
Are you talking about researchers not giving credit for pure data sets? That is a really very different topic.
> Are you talking about someone crediting a webpage without attempting to attach the name of the author? I have never seen this in academic research, ever. Can you link to an example?
Let's continue with the Google Ngrams example. All I had to do was search Google Scholar for "Google Ngrams" to find easy examples on the first page of results, like this one: http://www.aclweb.org/anthology/W14-1611
The paper is written by famous Stanford researchers, and they do not cite a single author of their data. Just URLs in footnotes.
This also happens to my research (ConceptNet) all the time. The fact that you haven't noticed it doesn't mean I'm making it up.
I don't see why you think researchers not giving credit for data is a different topic from what I brought up. What I said was, researchers in my field are in the habit of not giving credit for data, and instead they have adopted the convention of using half-assed footnotes with URLs in them, which is not appropriate credit. I speculate that one reason for this habit is that data often comes from inconveniently non-academic sources and would look bad in the bibliography. The habit is so pervasive that authors remain in the habit even when a paper exists.
It's a different topic because the OP dispute (whether to give credit to data generation/creation as comprehensively as to data analysis) is logically distinct from the OP dispute (whether to give different amounts credit to the same sort of contribution based on where/whether it was formally published). The fact that the people you criticize continue to not properly give credit even when a formal publication exists demonstrates this starkly.
If the persons who compiled a data set (or made a latex package or or or) published a paper on that thing, by all means, cite it.
If they didn't, but they do have a web page on the thing, then give them credits in the currency of the web: links.
GP has a point: if the creators want scientific credit, they can easily get that via ArXiv (or, of course, a peer-reviewed publication). If not: increase their PageRank.
> It's pretty common in my field to use footnotes to give credit to other researchers for ideas that arose in an informal context, e.g., "I thank A. Zee for pointing this out".
> It's not that nobody knows how to cite it, it's that nobody wants to cite things from outside of academia, perhaps because it lowers their own perceived prestige.
Such is academia, I guess. It’s more important to get “prestige” than to properly cite someone else’s work-where it was published should not matter.
I don't see a problem with a footnote entry, this is a form of credit. The issue with bibliographic entries is that they are each day more automatically processed, so publishers and editors try to avoid non-standard entries. On the other hand, if you're publishing something in a blog or webpage you're not looking for the kind of credit that is given on a bibliographic citation, so a footnote seems to be just fine.
"It's not that nobody knows how to cite it, it's that nobody wants to cite things from outside of academia, perhaps because it lowers their own perceived prestige."
Unlikely. I can't speak for everybody, but I can speak for myself, and I would be extremely hesitant to cite a random webpage in a journal article for several reasons:
1) It can change. ArXiv has versioning, but even so, this is still a problem. You cite something, and months later you go back and find the thing you cited says something slightly different.
1b) It can disappear. Again, less of an issue for ArXiv, but a big problem for any other website. Linkrot is real.
2) There's probably a better source. Maybe in this example a 4chan thread is the original provenance of an idea, but 99 times out of 100, there's a better, original source for the idea.
3) There's no peer review. Peer-review is not a pre-requisite for publication, but if you don't have it, it just makes people more skeptical of your results.
4) It's perceived as lazy. This is closest to your point (i.e. prestige), but not the same. Like it or not, if you cite informal sources in a paper, people start to wonder about the rigor of everything else you do. It's an intangible, but when you're trying to get papers accepted, intangibles count.
Basically, citing a webpage is risky. You avoid it if you can.
For a paper in mathematics, perhaps it would be best to quote the entire proof of interest, if it doesn't appear anywhere else in a peer-reviewed publication. That would avoid the link rot problem. Also make sure archive.org has a copy, if possible.
Sure, you can quote a short proof. But again, that's a niche solution to a niche problem, and it defeats the purpose of having citations. It also doesn't convince anyone else that they should believe the proof you've cited.
Archive.org is not a solution to this problem. Nobody is going to go fishing in archive.org for a citation that has disappeared. Even if you have the URL, what is the correct date to use?
There are just practical problems with link citations, and you don't need to invent malevolent intent to understand why researchers are reluctant to use them.
>It's not that nobody knows how to cite it, it's that nobody wants to cite things from outside of academia, perhaps because it lowers their own perceived prestige.
What makes you say this? Citing anybody does little to increase your personal prestige. I think the issue has more to do with 4chan not being peer-reviewed, and the proof being so recent that few papers have yet been written.
On the contrary, citing a hot new thing is what academics do best.
Somebody pointed out in the Twitter thread that academics often cite unreviewed papers on arXiv. If true (cannot confirm it myself, not in academia) then the peer-review-as-a-prerequisite-for-citation thing doesn't really hold water.
It depends on the field. In fast moving fields like machine learning it is common to cite unreviewed stuff on arXiv. In slower fields it is less common/accepted.
Not citing sources is plagiarism, are you actually saying that it is the norm in computational linguistics to plagiarize? If so this is extremely worrying.
You want reproducibility, the results will be affected a lot by your data source, real citations enable tools like google scholar. Why not cite? I see only drawbacks in continuing with the footnote/no citation trend.
Well I know for 4chan, each thread is ephemeral and depending on how active a board is, the thread will be deleted. Which I think is the challenge citing it. Since even a day or two after finding it, it can be lost.
Does that even work on 4chan? On /b/ threads can vanish in less than an hour. There used to be specialized sites such as 4chanarchive to fill the gap, but last I checked most are dead.
I read this more that if they follow common web citation guidelines, there’s a strong probability that 4chan will replace the target with goatse, or something worse.
We cited a random persons blog in my masters thesis, supervisor didn't blink an eye (that person had the most clear and concise formulation of the problem we were trying to solve).
That's a non-answer. Peer review is a thing in mathematics precisely because it's difficult to verify if a proof is valid. In a sense you can say something like, "A proof is valid or it isn't", but that's vacuous because you still need someone to check it.
There are many examples available of proofs which have been initially accepted but later discredited in the mathematical community. There are also many examples of proofs which have been polarizing from their first publication. Consider the controversy surrounding Mochizuki and his Inter-Universal Teichmuller Theory. Also consider that there have been credible (though ultimately incorrect) proofs published by credible people for several famous problems, such as P = NP. The academic community at large can't just eyeball a proof and know it to be true, and often even fundamentally flawed proofs require more than a week of careful attention to be declared incorrect.
If you want to argue that peer review is an imperfect system then that's one thing. But to say that a proof is demonstrably true or it isn't ignores the entire complexity of actually figuring out whether or not it's true. Some people can rely upon their own ability to do this, but for the most part the mathematical community is so heavily specialized that even most mathematicians cannot find errors in advanced proofs unless it's their particular subfield of research.
Yes, it was a mostly non-answer to a mostly non-question. It was in response to citing a proof in your own work, not a comment on the need for peer review of proofs in general.
The bulk of your rebuttal is exactly my own point. Peer review doesn't actually prove anything one way or the other. The ultimate truthfulness of a cited proof is all that matters. The idea that cited proofs should be peer reviewed don't add intrinsic validity, it just fosters academic laziness.
"Boy that's a fantastical claim that doesn't seem right; but all the citations are peer reviewed and I see nothing obviously wrong so I guess its true!" -OR- "Boy That's a fantastical claim that doesn't seem right; I don't see anything obviously wrong here so lets double check the cited proofs they based this on."
What's the practical upside to only using peer-reviewed proofs besides speed at the cost of fidelity? What's the practical downside to using unreviewed proofs besides "more work"?
People do use unreviewed proofs. They just mark those results separately from the rest of the paper so that people (correctly) know not to rely on them, or at least to continue to treat them as tainted; they also usually try to avoid relying on such theorems for critical results. Which is a way better solution than what you're proposing, which is to just mix them in with proofs that people are reasonably confident in and hope for the best.
Not so fast. Yes, many proofs do follow from formal logic and maths, but new proofs often connect things or draw inferences that haven't been done previously, thus requiring human review to verify.
I agree. Its doesn't seems like that big of a deal to cite a web article. What's causing trouble is the stereotype among the research institutions to not trust/value communities like 4chan. I am glad that this is being highlighted.
I don’t see any mention of Zenodo but it is a potential solution for, in their words, the long tail of science. One could post some extract from the 4chan page, post it on Zenodo and have a DOI to work with.
For those not familiar with the series in question, there's a reason why, if you're going to choose a TV series to discuss permutations, you'd choose Haruhi.
"The Melancholy of Haruhi Suzumiya" is a 14-episode anime adaptation which aired in 2006. Unusually, it broadcast its episodes not in chronological order, but in a mixed order, but for good theatrical reason; the series comprises a main plot arc, and various self-contained episodes which chronologically occur after it. Rather than have the climax of the series come halfway through, the self-contained episodes are distributed in between the episodes of the main plot arc. It sounds mad, but it actually worked very well.
This idiosyncrasy was explicitly referenced by the episodes themselves: the "next episode" preview at the end of every episode has one of the characters say "The next episode is episode X", then another character says "Wrong! The next episode is episode Y", where X and Y are the numbers according to broadcast and chronological order.
Amusingly the story gets more complicated from here. After the series was unexpectedly popular in the anglosphere, it got licensed for an official release. After the contract was signed and it was too late to change it, the company which licensed it noticed the contract had a stock clause in it requiring the episodes to be released on DVD in chronological order. They were unable to get this changed, whereas the strong fan preference was for the broadcast order. As a partial workaround, I recall they released a special edition with the same episodes on two sets of discs: one in broadcast order and one in chronological order.
But the DVD "chronological order" isn't quite chronological either; one of the episodes is a show-within-a-show, a film made by the characters of the series. This happens chronologically late in the show, but it is the first episode in both the broadcast and DVD order (which makes for an extremely confusing introduction to the series).
So there are at least three different plausible ways to watch the series: broadcast order, chronological order, and DVD order (which is almost like chronological order but not quite). This absurd legacy makes the series rather the obvious choice for discussing permutations of episodes. (My own preference is for yet another permutation: broadcast order, but with the aforementioned first episode moved to before the 12th episode, which relates to it.)
> "The Melancholy of Haruhi Suzumiya" is a 14-episode anime adaptation which aired in 2006.
The first version to be correctly. They later added another 14 episodes, but instead of calling it a new season, they mixed in the new episodes between the old episodes and rebroadcasted it as one complete series.
As a special "joke", they also had a timeloop-story spanning multiple episodes, with each episodes telling the same story with slight changes in the story, and individual art-style, meaniung each episodes was produced new not just recycled from the first episode. The "joke" with this story was, that of the 14 new episodes, the timeloop-story filled 8(!) episodes and went under the funny name "Endless Eight". I think someone was fired himself for this.
The Endless Eight was widely regarded as a rite of passage in anime spheres. A lot of people would end up skipping the second half after they realized what was going on. Great show, and as far as I know the only one to have attempted something like this.
I personally didn't like it, it just seemed soooooo long. The only redeeming factor (in my opinion) is that Suzumiya Haruhi is outlandish enough for it to be the perfect vessel for such an experiment.
The act did have some thematic significance with the movie release in 2010, in which the time loop incident is revealed to be a major reason why the movie's events took place.
Besides, it's a little harder to skip all eight episodes in the original weekly broadcast not knowing when the loop will end, so viewers are more or less forced into the cycle.
> For those not familiar with the series in question, there's a reason why, if you're going to choose a TV series to discuss permutations, you'd choose Haruhi.
An other good pick would be Firefly, famously aired in complete disorder by Fox: the broadcast order was (numbers referring to DVD/canonical) 2, 3, 6, 7, 8, 4, 5, 9, 10, 14, 1 (which is two parts) with the series cancelled before 11, 12 and 13 were aired.
Not only that, but other broadcasters opted to set up their own wacky broadcast order:
* in Mexico, MundoFOX took the Fox order but replaced 1 by 12
* in SA SABC3 used the Fox order but did broadcast 11, 12 and 13 as well (in that order, after 1)
* in Portugal SIC used the Fox order followed by 13, 11 and 12
* Australia almost followed the intended order but swapped 1 and 2
* NZ followed intended order until 10 then used 14, 13, 11, 12.
Yea, this is also something that is not unique to the internet age. Academics have been (imperfectly) assigning credit to informal contributions since there have been citations (and before!). Footnotes like "I thank B. Johnson for this argument" have always been used.
An "academically correct" solution (i.e., not to taint the prestige of a scholar who cites the solution) would be to cite a real, proxy scholar who checks the solution, but gives all credit to the anonymous 4chan user.
Scientists often abhor citing from blogs, forums, and Wikipedia, as those publications "are not peer-reviewed by trustworthy professionals".
I actually feel that the response here is a great one:
"I feel like you should be able (forced even!) to cite things that aren't journal articles- even blog posts- if they contributed to your body of work. But citing things that aren't prestigious reduces the perceived prestige of your own, which is a perverse incentive not to cite."
EDIT: I can't seem to figure out how to link to specific tweets, so this is the guy that said it: https://twitter.com/_delta_zero
He has a great point. If the proof is correct, you should cite the raw document. If you have to cite it through a proxy scholar, it feels like a form of gate keeping.
Exactly: a correct source is a correct source. No need to reserve legitimacy exclusively for paid journals. Maybe the source could be cited through internet archive or other so that the document is definitely the same version as the one seen at the time of citation.
Is that actually true? I have no experience in grad level math but in my field (security) people frequently cite all sorts of non academic content. I've personally done it a lot of times and never heard anything about it from reviewers.
No, it's not. People cite "informal" sources all the time.
This only really matters on the receiving end of the citation. I.e., if lots of people are citing your blog or forum post, then it can be a good idea to take your content and put it in arxiv or similar. Just so that the citations "count" for you in the academic setting.
And similarly, it's a bit rude (or at least odd) to cite someone's blog post when there's an accompanying published paper that describes the same work.
But citing informal sources where no formal source exists is not (and afaik never has been) a problem.
I agree that this is probably the cultural solution, provided the author can't be coerced to post on Arxiv, but it's an ugly one, which reeks of an ivory tower and gatekeeping ("nobody who isn't an established scholar in our tradition is truly capable of our work").
Perhaps this is better seen as a substantial defect in the fact that the academic web-of-trust model implemented in publication is based on quality of source (e.g. journal's prestige, or the submitting scholar's prestige) rather than quality of scholarship (e.g. whether a critical mass of researchers, rather than 1-3 anonymous reviewers, find it compelling and 'sign off' on it being good work).
I hope (and suspect) that whatever emerges or is developed to replace the paid journal model of academic trust is designed to handle these situations gracefully.
You can archive it through various services and save an exact screenshot/html snippet/text that can be cited. In the sources you add "Retrieved on.." and "Archived link... Secondary archived link.." .. and some journals are allowing artifacts these days too. Typically this is used for code or data sets, but it could also be used for copies of the anonymous source material.
You should be able to follow the references in a paper even in a hundred years or later. This obviously works quite well for print. Blogs, wikis and other websites are in general not permanent enough to be useful as references. Even when you archive them, the archive is likely to disappear much earlier than the paper for which it was made.
In fact, I fear that the trend to publish supplemental material may lead to another crisis because a lot of that material is not managed and archived well and may disappear without warning. Some papers contain important details like long, involved and non-obvious calculations only in their supplements and are not really reproducible without them.
Realistically, the proof would probably just be rewritten in a paper in standard notation, with an indication that the original proof (or the idea) was due to an anonymous author.
You see very similar things happen from time to time where there are "folklore" theorems in mathematics that someone eventually writes down 20 years later. Sometimes they're able to give an attribution, and sometimes not. But always they give a formal proof using modern notation and credit the idea with whatever fidelity they can.
The first reply from the thread's OP got me thinking about something I'd never really noticed:
> I used to think it was because as a quasiscientific community we highly value peer review but ML people are totally fine with citing papers on arxiv that haven't been accepted anywhere as long as you can replicate the results - I think perceived prestige nails it[1]
Peer reviewed citations are excellent, and unstable citations like Wikipedia are obviously concerning. (At bare minimum, you'd want to link to the page history instead of the page.) But the willingness to cite arXiv content without hesitation definitely suggests that peer review is only part of what's going on there.
The person writing that review is going to need to cite it somehow though so the question of how best to cite an ephemeral anonymous post that's only available through archives or secondary summaries (like the linked wiki) still stands.
Definitely true, and worth responding to. (For instance, with a specific Internet Archive source or a link to the Wikipedia version history instead of main page.) But it's also uncomfortably true that academic works have an ugly tendency to disappear post-citation. Less so in mathematics, I'm sure, but even in the hard sciences far too many papers out there reference things that weren't digitized, rely on data that wasn't published with the paper, or otherwise hinge on something you can't actually get to.
There's probably no great answer here, it just feels like authors and editors sometimes use "it was reputably published once" to shrug off disappearing sources.
It feels like there is, or should be, a distinction between "I'm basing my analysis on this other data/analysis/etc. that I'm working from the assumption it's correct" vs. "This idea/analysis/etc. gave me a good starting point. I'm not depending on it but I want to credit it."
Thinking about what I do. There's a difference between me depending on the results of a survey and hearing a novel take on a market or whatever. In the former case, I'm dependent of the survey methodology being sound (and I'm going to trust some sources far more than others). In the latter case it's more "I never thought of things that way" but I'm going to do my own analysis.
Peer review is a base level quality check, not a correctness check in most disciplines.
It provides value, indeed most of my papers have been improved by going through peer review. But it's good to be conscious of what it can and can not do.
One thing that it does do is signal intent. It signals that you are willing to engage with the expert community, and specifically their feedback and criticism.
Finally, there is of course a fallacy in saying "not trustworthy either". If 99% of random blog posts are wrong, but 60% of published papers are wrong, then a random paper is 40 times more likely to be correct than a random blog post and it's perfectly sensible to be hesitant about citing the latter.
Finally, scientists have always cited outside of traditional sources and continue to do so [1]. I see no evidence for your assertion that "appeal to authority" has become more prevalent or important than it used to be. With all the ongoing open science initiatives it's rather the opposite.
[1] Personal communication with various scientists, 2018
Ideally this is true. At the worst though, you'll end up with a peer reviewer that gets pedantic or one who's a member of a citation cartel [0] that demands that you cite his/her friends papers unnecessarily as a quid-pro-quo leading to publication. It's difficult to say how common this is in practice since academia is generally terrible about self-reflection, but I observed concerning evidence of this when I was still in science. In some fields at least, there's ample evidence [1] that peer review provides inadequate QC controls for methods sections, making it literally impossible to replicate results due to entire processing steps being omitted. This is why post-publication services such as PubPeer have become more and more relevant, since it has become obvious as time has elapsed how woefully inadequate the initial screening process is.
It's good to keep in mind that reviewers do not make the decision. They recommend the decision. The Editor, who knows the identity of the reviewer, can see if a reviewer is asking for unreasonable citations.
So while this definitely does happen (and to different degrees in different fields) there are some checks and balances, too.
True, and your overall point (that it is better than the alternative; i.e. viXra) stands. It's really one of those things where ymmv depending on the field- I just happened to notice a somewhat disheartening tendency for those sorts of horse trading practices to creep in, which is ultimately one of the major reasons why I left science.
I could be naive, but my impression was that there are almost too many degrees of freedom in the peer review process that could allow for this kind of chicanery, and that it would almost be preferable to have criteria for acceptance laid out in advance (i.e., exactly what steps, metrics, and citable articles were necessary) and use those the way an actuary would. In some ways I almost think that practices like preregistration serve as a better first-pass indicator of whether or not research is publishable.
Completely agreed. I think the system is ripe for an overhaul, and has many problematic aspects. We'll see how it goes, after all it doesn't look too bad with the open access revolution.
I personally would like more use of pre-registration (there is something about that on the front-page right now. But also I think deanonymzing peer review and publishing the reviews along with the papers could be very interesting. Obvious drawbacks as well of course but I think it has potentially massive upsides.
> It's worrying that academia is slowly turning into a rigid "appeal to authority" church
Seeing as how academia literally started out as an accessory to the various Christian denominations for educating priests, I don't see how you really could characterize it as "a space for truth and inquiry". It's only been relatively recently that "appeal to authority" has given way to more secular/scientific ways. Even within the last couple of years, I would wager that academia is no more authoritarian than it has ever been- tenured professors have almost always had a disproportionate amount of influence over their trainees' futures, etc. The head of my father's department when he was getting his PhD in the 70s actually attempted to pull my father's stipend because of an avant garde project that would today be accepted as normal.
I've seen a trend towards the opposite, actually, with graduate students becoming more skeptical of the institution and more proactively addressing its shortcomings through grad student unions and advocacy of open science policies (which neither university administrations nor longstanding tenured professors are fans of in general).
I am about to get really mad about statements like this one. It contains an egregious generalization within itself, rendering it essentially untrue.
The papers check results in fields like medicine and psychology which are plagued by the fact that they have to rely on various kind of effects from random sampling of study participants and statistics. I am not surprised that a study with 10 participants leads to wrong results with a significant probability.
But this does not generalize to science in general. Physics and chemistry do not suffer fron this because their experiments have fundamentally different setups. So extrapolating from medicine to "most science is wrong" is just as wrong. You need to approach these fields differently when you want to know how many wrong results are published.
The Haruhi problem reminds me of De Bruijn sequences, aka why I don't trust 5 button keyless car entry systems. Fords keypads do not have return keys, delays, or attempt limits. A 3125 button sequence can guarantee you will unlock the door. It would take about 10-20 minutes for someone who has practiced to break in. Someone this year made a bot that did it in about 5 minutes.
Additionally de bruijin sequences are about generating all possible n-length strings from a fixed k-character set (where you can repeat characters). In this problem we only want to generate k-length permutations of a k-character set.
You're worried about someone standing next to your car for 10-20 minutes pushing buttons and not messing up a 3125 number combo on a keypad with almost no tactile feedback?
Yeah, this is nowhere near as bad as the proximity remote attacks where a simple amplifier is enough to trigger the key to open the door - or even to drive away with the car if you do a bit more serious engineering.
If you read down in the comment thread on the Hackaday article, they make it clear that Fords after about 2002~2003 have an attempt limit. I happen to know that my '05 did, at a bare minimum. It's still deeply not secure, but nothing about a car really is... It's intended to make my life convenient and keep the honest people honest.
Ethically do you even have to cite something properly when the person who solved it was intentionally anonymous? If the concern is that people will think the person writing solved it, they can just write a sentence where it came from.
Not to mention that a citation is there not only to show the world whose content you are referencing, but also because in case the reader doesn't believe you and wants to go check, they can do so by following the citation
Intentional academic dishonesty is an entirely separate issue. If you're going to plagiarize, it doesn't really matter if there's a formal way of citing the material you don't plan to cite.
citing "anonymous", if it is unclear who the author is, is nothing new. I did that a long time ago when I a cited something from the nmap man page in a paper for university . Back then it was unclear - at least to me - who the author is. This is a very common practice. I do not see, where the novelty is.
What do you mean by hash? 4chan posts use an incrementing id per board, so if you put >>>/sci/3751197 you have a unique identifier for the post (sci being the board and 3751197 being the post number). For convenience you could link to an archive site as well, but they're known to die every few years.
The hashes assigned to your IP on the boards that have them are per-thread to my knowledge. All your posts in one thread will have a particular hash, but your post in another thread will have a different one.
Ah I forgot about that. It would certainly be a good idea to include it if available. Do you know if it is necessary for a unique id? I've never browsed any of the boards with user hashes.
You'd need a link for people to see the original content too so you'd want to include a link to one of the archive sites I guess (as a second best source for an ephemeral post). It's odd but there are citation patterns for websites, wonder if they include a standard for including archives of sources not available at time of writing.
This might be a ghost discussion. I have worked in scientific publishing for quite some time and have seen citations in the form of:
* personal correspondence
* emails
* facebook profile pages
and my personal highlight
* google search queries (I kid you not)
All of these citations actually occured in scientific journals. I assume Nicolas Bourbaki was also cited several times, without anyone caring about "academically correct" citation rules.
I would cite the wiki with a note in the citation that the wiki was a copy of the 4chan thread. And I'd put a copy on an web archive service. WebCite was mentioned in the Twitter thread: http://www.webcitation.org/
If this isn't credible enough for most academics, then I think they have their priorities wrong. I want to cite the best work, not just work that looks good superficially. I assume that the claim that this is the best known lower bound is true. In that case, if I am writing a relevant paper, it's my duty to cite the result regardless of where it came from as far as I am concerned.
If anyone's trying to figure out what the "Haruhi Problem" is, logically, because it took me awhile to grok:
You have a `n`-episode show, and you want to watch every episode in every possible order. However, if one series-watch-through ends with episode `x` and the next series-watch-through in sequence begins with that same episode `x`, you don't have to watch episode `x` twice in a row. Given these constraints, how do you find the "ideal" way to watch through the entire series, in all possible combinations, in the shortest number of episodes?
Looking at this from the other side, if a non-mathematician "solves an interesting problem", is there a recommended way to make the solution public? Something better than posting to 4chan?
Is it possible to publish a result properly if one is not a mathematician and not affiliated with an institution?
You can send it to a journal as a private individual, but probably a better way would to post a manuscript on arxiv.org, and once it generates certain buzz (if the problem is interesting and well solved) then either submit it to a journal yourself or partner with other mathematician to polish it, etc. and then submit.
To be able to submit to arXiv, one needs to be endorsed by someone in the domain, which could be a potential hurdle for someone not in academia. (and I'm not about to suggest viXra as a way to get around that)
Yeah, I actually used to be a math post-graduate and have a draft of a paper with an interesting result, so I wanted to publish it but I had no idea how to get "endorsed". This was like 10 years ago and I switched to a career in programming so I doubt I'll even understand my own proof now.
Back in the 90s there was the same discussion involving citing USENET news postings. I would suggest that some of the news postings were better peer reviewed than Journal articles. There was certainly a time when the top academics of certain fields were USENET junkies.
I suppose hacker news is the most recent equivalent.
The authors of FFTW published a paper about their computational method, an early example of "cache-oblivious" array traversal order. (In this method, uses of elements noodle around in small areas, getting a free locality boost. Typically they use a Hilbert or Peano path.) They refused to cite Todd Veldhuizen, who had done it (with plenty of publicity) a year before. They refused to discuss why, but it is apparent that it was because Veldhuizen had published in Dr. Dobb's Journal of Computer Orthodontia ("Running light without overbyte").
Something I somehow doubt too many of you would know but 4chan, despite being considered an unsavoury place because of certain subsections of it (which I'm not proud to admit I visit to learn what "the other side" is talking about), has a lot of great communities as well and I usually find the discussion quality there better than reddit.
Also, in the context of this thread, 4chan is an ephemeral site so there is no such thing as link to content.
If you are a video game enthusiast, particularly of a niche genre, /vg/ can be a good place. I'm really into roguelikes, so I visit the '/rlg/' thread around once a week to see what's going on and potentially get some advice if I'm at a tricky part in a run or need to make a decision about where to take my character.
Standard disclaimer applies about 4chan. If you're thin-skinned, you may find it not worth the trouble. Just don't engage with the trolls and you'll be fine.
/tg/ is truly excellent for non-video game discussion (board games, card games, w40k, MtG, etc). They are a bit opinionated much like /vg/ and I'd agree, thin skinned people will be happiest if they stay away. For example there is little love for Paizo/Pathfinder on /tg/ (although there are some fans...)
I spend a lot of time on /out/ as in outdoor hobbies because I live in a "recreational state". No online community is 100% perfect but its bizarre how /out/ is on average more civilized than heavily moderated communities.
I don't know what it's like now, but for a time /m/ was a major hub of the tokusatsu community, and most of the major fansubbing groups got their start on /m/.
/diy/ is really great. Generally post quality is inversely proportional to board popularity. Also avoid any board related to entertainment if you want rational discussion.
To provide a concrete example of why /diy/ is great, most "diy-ish" discussion boards I participate in devolve into "look how awesome I am" "look how much money/debt I can spend" "I am more of an authority than you" "my favorite authority could kick the butt of your favorite authority" and similar self-aggrandizement-posting. People on /diy/ actually talk about /diy/ topics, which is a refreshing change from most non-anon non-chan discussion boards. Its a relatively low-sophistry zone... emphasis on relatively...
The biggest problem I see on /diy/ is questionable humor, sort of like how blue collar occupations used to haze the apprentices a bit, you'll see the most ridiculous / comical suggestions sometimes.
I noticed that in a paper written by different authors, rather than citing the StackOverflow post when discussing that solution, cited our paper above.
Isn't board name (/o/, /vg/, etc.) and post ID (the number people talk about when they say "trips", "dubs", and so forth) enough to uniquely identify a 4chan post?
Some people here talk of the difficulty of citing sources that may disappear. This can easily be solved by just reproducing the proof (and stating explicitly that that is what you're doing and providing references to the ephemeral sources as they exist when you write it). Of course you should clean up the proof as much as possible. This only needs to be done once since others can site your reproduction of the proof if they have similar misgivings.
In real-time rendering, a lot of innovation and state-of-the-art is published in non-peer-reviewed conference slides by companies, or blog posts by non-academic individuals.
These should absolutely be cited if you as a researcher are convinced they are correct. There is no good reason not to.
A surprising amount of effort goes into maintaining uniform citation formats for papers. Bibtex is one of the various tools built to manage them. They're effectively the original hyperlinks, on paper.
Some undergrad or masters should do some work to validate the work and write a paper about the validation. Then they'll be cited by anyone who wants to use the 4chan thing
...who previously jumped out of the plane in the middle of the night over the forest with the blackmailed money, badly functioning parachute and survived, contrary to common sense. Because the posters there wouldn't try to troll. Yes, take that with... whatever.
You can always make use of the various archive sites. The thread referred to is from 2011, and is available on warosu[1]. It is worth noting that 4chan keeps its own 2 week archive now as well.
You're misunderstanding the problem. It's not a question of the number of permutations. It's a question of the total number of episodes that must be watched to have seen all possible permutations.
Not sure why you're grey; you are correct about why it's not introductory.
Overlap between permutations is why it's not as simple as it first sounds. For example, to borrow the wiki example, "121" includes both "12" and "21" permutations.
Not at all, 4chan has* quite a few very smart people who for various reasons might not have "made it" in life. This type of thing used to be called "weaponized autism" or "autism" on site. This usage of autism really means something closer to "extreme obsession".
* or at least had. I think that a lot of people left the site over the last few years as the site changed quite dramatically.
Under the anonymous veil is hard to know if some poster is an accomplished person IRL. For all we know one of the anons that contributed to this problem is a Fields medalist whom is also a closeted weeb.
That's true, and in this specific case it seems hard to tell. I guess my idea of the average 4chan user of old is a college dropout shutin. That might be because when I started browsing being a shutin/NEET was the "cool" thing. I think this quote sums it up nicely: "In my online persona I pretend that I am ironically pretending to be a NEET living in my parents’ basement, but I am one in actual fact."
Pretty much all my sort-of-nerd friends and I left the site some time between 2012 and 2015. I think it all went downhill when Obama got his second term and the 'libertarians' became dislodged by full-blown Nazis. There used to be quite a few prominent academics on /sci/'s early days.
Maybe? I've always attributed the fall to a combination of the fappening, Moot's departure, and the 2016 elections. The huge influx of new users combined with a change in moderation absolutely destroyed the site. My memory isn't so good, but I don't think there was much political content outside of /pol/ and /b/ until 2015 or so. Might be dependent on what boards you browsed though.
I think you're right about Nazis coming in, but I think that it was much more contained when Moot was there. I guess maybe that provided the setup for the 2016 elections.
For what it's worth the increase in traffic from 2010 onward was largely mobile users, with desktop users staying around 50M, while total users grew to a peak of about 133m at the start of 2018.
Yeah, I still see people suggesting that phone-posters should be banned. I wonder what the site would be like if it had been implemented. I didn't know how large the numbers were though, that is nuts. Looking at the general quality of the site though, it makes sense.
It seems like it would be a great case study for someone interested in social psychology or Social Norms Theory or something similar. I remember when I first came across the site 13 year old me being worried if I posted something too stupid people would trace my IP and hack me.
With new users being added in at such a rapid pace it seems like it was impossible to on-board them in terms of generally accepted practice; a eternal summer echo of the "Eternal September" of the usenet days which involves many of the same complaints.
I had a pretty similar experience, so I didn't post much at first. You used to see people getting called out for not following site norms, being told to "lurk for 2 years before posting" but you don't see that anymore. I think that type of self-moderation is really good for maintaining a quality community, but it selects for a certain type of user so most people don't like it.
I think you misunderstand the population of 4chan. Across its diverse boards, there's all kind of people, and they don't fit into a single mold. For example, all kind of skilled laborers and handy homeowners, many with families, frequent the /diy/ board. The technology board /g/ is host to a lot of talented, high-earning programmers mixed in with tinkerer types. /toy/ is full of all kinds of folks, from teenagers with a few anime figures to parents with children, the only commonality being liking some kind of toys.
A lot of people who use it simply like the lack of filters or reputation that anonymity provides. A post doesn't come with the baggage of a username that people can hunt through, so it can really only be judged by its content, not by the reputation (good or bad) of its author.
Just because 4chan shows up in the news when its users are stirring up trouble doesn't mean its all a bunch of edgy teens. Even if thats how it began, an edgy teen from 4chan's inception is now old enough to be a part of the adult workforce.
>A post doesn't come with the baggage of a username that people can hunt through, so it can really only be judged by its content, not by the reputation (good or bad) of its author.
That's a goddman myth. There are many ways to break anonymity (usernames, tripcodes, country flags, user ids) and the very moment people have any ability to track your posts they will dogpile on them. /int/ is (used to? don't go there anymore) the worst example, in that posters that aren't from North America or Northern Europe will get trashed quickly with stupid memes no matter what they say.
Even when it comes to truly anonymous content, this is just not true. If, for instance, you reply to multiple posts and in one of the replies you express the wrong political opinion (on 4chan that means 'not being alt-right') every single person you replied to will trash you and call you a n* or whatever. People don't magically change their way of thinking just because they're on an anonymous platform, they're still full of biases and idiocy. And poison.
I would both agree with and disagree with the claim that anonymity provides higher quality conversation, free of virtue signalling.
On one hand, there's /out/ focusing on outdoor hobby/lifestyle discussions which only has a minor infestation of Mora knife meme. Admittedly I own two Moras and those are excellent outdoor knives so the boundary between meme and reality is fuzzy. Other than that, /out/ is very civilized and contains extremely high quality content.
On the other hand, there's /fit/ which seems to consist entirely of lifting broscience despite being anonymous.
It is possible I'm missing the point because this discrepency reflects actual reality in that there is a useful objective opinion of, for example, proper medical foot care for long distance hiking or burn care, but there really is no one size fits all simple magic recipe for power lifting gains or "the" correct body building diet.
Yeah, what you said was true maybe five years ago. There used to be quirky geek types being ironic all the time and making cool stuff. When I left though it was all Nazis and people who somehow stand Nazis 'interjecting' every few comments about the IQ of black people or evil SJWs. This can drain even the kindest, strongest soul. Even moot got tired of it and sold it to that Japanese con man.
This tweet is misleadingly worded. The solution was not posted on 4chan, but rather by an anonymous user on a “wiki mainly devoted to anime” as the original tweet quoted by this tweet makes clear.
The original tweet is somewhat confused though too because the linked wiki is an archive/summary of a thread on /sci/ which is one of the 4chan message boards. It's copied there because of the ephemeral nature of posts on the actual boards so you can't provide a permanent link, thus linking to the next best source seems the best choice.
That makes more sense, thanks for the clarification. The tweet saying that it was originally posted on a "wiki" and then linking to a wiki made me think that that wiki was the actual source.
But that nobody knows how it should be cited is just plain wrong. They should maybe contact their research library and get a introduction to citing. All citation styles I know of can handle web citations. Anonymous authors is nothing new either. And you have things like the internet archive or Webcite[0] to make sure the web document doesn't disappear. Else it could be cited as personal communication, which usually covers direct communication, but can also be used for nonarchived discussion groups.
The reluctance to cite a source because it's not a peer reviewed research paper, is bordering on cargo cult science. As if going through the motions of scientific publishing should somehow elevate arguments to truth.
[0]https://www.webcitation.org/