Hacker News new | past | comments | ask | show | jobs | submit login
List of scientific papers found in OpenJDK source code (lowlevelbits.org)
198 points by AlexDenisov on Nov 23, 2016 | hide | past | favorite | 28 comments

Here's a google bigquery that lists the most common PDFs referenced in the github sample dataset, and the top 100 results: https://gist.github.com/llimllib/3f1877eab06208958060f491cf3...

It's possible to run this query against the full github dataset but I couldn't figure out how to pay for it, so if somebody wants to do that it would be excellent.

just a note: it's bizarre that I absolutely cannot find a way to determine a) how much it would cost to run or b) how I would pay for it if I wanted to run it

I changed it to query from [bigquery-public-data:github_repos.contents] instead, and before I execute the query it says "Valid: This query will process 1.68 TB when run.".

Queries are $5/TB [0].

So a bit less than 10 bucks. :)

Edit: brb, that's totally worth it.

[0]: https://cloud.google.com/bigquery/pricing

OK, so why is the most common document something to do with the Turkish 2012 elections? (If the rough Google Translate is to be believed...)


> 3896 http://www.pdf


Yeah I didn't care to make the regexp perfect. The most common site is www.pdfsharp.com, then www.pdfparser.org, then www.pdflib.com, etc etc

Weird! Mine just says "Quota exceeded..." without ever saying how big the query will be. Where do I find that info?

(http://i.imgur.com/3EkPYIY.png is what I see)

Since the Java source is open, its all there to be peer-reviewed. If a paper its based on isn't the best you can make some noise about it. This is a good situation for Java.

Some more found by a quick grep for "et al.", "Proceedings", "Proc. ", "Symposium", "Conference", "Conf. ", "PPoPP" (a conference with an easy-to-grep-for name), and "acm.org":

hotspot/src/cpu/ppc/vm/ppc.ad: See J.M.Tendler et al. "Power4 system microarchitecture", IBM J. Res. & Dev., No. 1, Jan. 2002.

hotspot/src/cpu/x86/vm/crc32c.h: V. Gopal et al. / Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction April 2011 8

hotspot/src/share/vm/gc/shared/taskqueue.hpp: Le, N. M., Pop, A., Cohen A., and Nardell, F. Z.: Correct and efficient work-stealing for weak memory models Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP 2013), 69-80

jdk/src/java.base/share/classes/java/util/Arrays.java: Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993

jdk/src/jdk.crypto.ec/share/native/libsunec/impl/mpmontg.c: "A Cryptogrpahic Library for the Motorola DSP56000" by Stephen R. Dusse' and Burton S. Kaliski Jr. published in "Advances in Cryptology: Proceedings of EUROCRYPT '90, LNCS volume 473, 1991, pg 230-244

hotspot/src/share/vm/opto/superword.hpp: "Exploiting SuperWord Level Parallelism with Multimedia Instruction Sets" by Samuel Larsen and Saman Amarasinghe [...] published in ACM SIGPLAN Notices, Proceedings of ACM PLDI '00, Volume 35 Issue 5

jdk/src/java.base/share/classes/java/util/SplittableRandom.java: Leiserson, Schardl, and Sukha "Deterministic Parallel Random-Number Generation for Dynamic-Multithreading Platforms", PPoPP 2012

jdk/src/java.base/share/classes/java/util/SplittableRandom.java: "Parallel random numbers: as easy as 1, 2, 3" by Salmon, Morae, Dror, and Shaw, SC 2011

jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Dynamic Circular Work-Stealing Deque" by Chase and Lev, SPAA 2005

jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Idempotent work stealing" by Michael, Saraswat, and Vechev, PPoPP 2009

jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java: "Leapfrogging: a portable technique for implementing efficient futures" by D.B. Wagner and B.G. Calder, PPoPP '93, http://dl.acm.org/citation.cfm?id=155354

jdk/src/java.base/share/classes/java/util/concurrent/LinkedTransferQueue.java: Using elimination to implement scalable and lock-free FIFO queues, Moir et al, http://portal.acm.org/citation.cfm?id=1074013

jdk/src/java.base/share/classes/java/util/concurrent/LinkedTransferQueue.java: "Bounding space usage of conservative garbage collectors", HJ Boehm, http://portal.acm.org/citation.cfm?doid=503272.503282 (this is the Boehm GC paper)

jdk/src/java.base/share/classes/java/util/concurrent/locks/StampedLock.java: Design, verification and applications of a new read-write lock algorithm, Shirako et al, SPAA 2012

hotspot/src/share/vm/opto/escape.hpp: Jong-Deok Shoi, Manish Gupta, Mauricio Seffano, Vugranam C. Sreedhar, Sam Midkiff: "Escape Analysis for Java", Procedings of ACM SIGPLAN OOPSLA Conference, November 1, 1999

hotspot/src/share/vm/runtime/os.cpp: Gilad Bracha and David Ungar: "Mirrors: Design Principles for Meta-level Facilities of Object-Oriented Programming Languages", in Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages and Applications, October 2004

jdk/src/jdk.crypto.ec/share/native/libsunec/impl/ec_naf.c: D. Hankerson, J. Hernandez and A. Menezes, "Software implementation of elliptic curve cryptography over binary fields", Proc. CHES 2000

jdk/src/java.base/share/classes/java/util/concurrent/SynchronousQueue.java: "Nonblocking Concurrent Objects with Condition Synchronization", by W. N. Scherer III and M. L. Scott. 18th Annual Conf. on Distributed Computing, Oct. 2004

That's a lot :) Some of your findings are actually listed in the original article, but not all of them obviously.

Ah, sorry, I didn't really check for dupes---I just skipped the ones with a pdf link in the vicinity. I'm just glad that sometimes the clever things that academics churn out are actually used in practice. Far too rarely if you ask me, but I'm biased of course ;)

I had to cite sources while implementing an artificial immune system (real valued negative selection and clonal selection algorithms). I read through a few papers for each algorithm and cited the clearest one as a source.

it would be great if it also mentioned which files the links were found in

Seconded! I really like this compilation. Very interesting to see the algorithms and data structures behind the implementation of a language, especially one of the more popular ones.

You can just grep by PDF name/url and find the code.

To be fair, sometimes code and comments get moved around, and any of us can use grep (or whatever other search tool you prefer) to find a specific link in the source.

Please, do it for the Linux source code.

About 99% of Linux (or even more) is drivers. But indeed there should be useful references in the scheduler, locking primitives, memory management and core networking code.

To be more precise, its actually a list of scientific papers referenced in the OpenJDK source code.

... as direct pdf links found via grep.

There might be more references without a pdf link.

I'm surprised the author didn't search for DOI links.

Asking as someone not as familiar with the research community as I'd like to be, what are DOI files, what advantages do they have over PDF/PostScript, and are they common?

It's not a file format, it's a digital identifier. The APA can explain it better than I can:

"A digital object identifier (DOI) is a unique alphanumeric string assigned by a registration agency (the International DOI Foundation) to identify content and provide a persistent link to its location on the Internet. The publisher assigns a DOI when your article is published and made available electronically."

So you can access a journal article by going to http://dx.doi.org/DIO-GOES-HERE. doi.org doesn't host files, just resolves them to the current and correct location. For example the DOI number for the first paper in TFA is 10.1007/11427186_42 so it can be accessed at http://dx.doi.org/10.1007/11427186_42

You know the DOI you know where you can find it.

DOI isn't a file format. It's an object identifier for papers like ISBN is for books.

TIL about DOI :)

Thanks, we've updated the title to clarify.

Are you saying you can't run a PDF? :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact