
Ask HN: Papers you read in 2015? - racoonear
Curious about scientific publications in your field that are worth sharing.
======
nrmn
I've been trying to read a paper a day since midsummer. These are a few of the
interesting, for me personally, since then:

Generating Sequences With Recurrent Neural Networks -
[http://arxiv.org/abs/1308.0850](http://arxiv.org/abs/1308.0850) Older one,
but important to understand deeply since other recent ideas have come from
this!

Unsupervised Representation Learning with Deep Convolutional Generative
Adversarial Networks -
[http://arxiv.org/abs/1511.06434](http://arxiv.org/abs/1511.06434)

Unitary Evolution Recurrent Neural Networks -
[http://arxiv.org/abs/1511.06464](http://arxiv.org/abs/1511.06464)

State of the Art Control of Atari Games Using Shallow Reinforcement Learning -
[http://arxiv.org/abs/1512.01563](http://arxiv.org/abs/1512.01563) Interesting
discussion in section 6.1 on the shortcomings/issues of DQN done by Deepmind

Spectral Representations for Convolutional Neural Networks -
[http://arxiv.org/abs/1506.03767](http://arxiv.org/abs/1506.03767)

Deep Residual Learning for Image Recognition -
[http://arxiv.org/abs/1512.03385](http://arxiv.org/abs/1512.03385)

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) -
[http://arxiv.org/abs/1511.07289](http://arxiv.org/abs/1511.07289) I wish they
did more comparisons between similar network architecture with only the units
swapped out. Eg. Alexnet, Relu vs Alexnet, Elu.

On Learning to Think: Algorithmic Information Theory for Novel Combinations of
Reinforcement Learning Controllers and Recurrent Neural World Models -
[http://arxiv.org/abs/1511.09249](http://arxiv.org/abs/1511.09249)

Just a few from my list :)

~~~
sgt101
Crumbs, it takes me about two weeks to get through a paper properly!

~~~
wodenokoto
I can't speak for parent, but I believe people who read a paper a day, don't
try to understand it deeply enough to be able to start implementing whatever
the paper talks about. Rather it is read to get an idea of the approach and
what kind of results it will give and what kind of problems it can solve.

~~~
smhx
For people actively working full-time in the field, some of the papers which
have simple but powerful ideas, reading the paper for 2 hours (or glancing at
the key diagram / formula) is enough to implement it.

For example:

Deep Residual Learning for Image Recognition

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

------
yankoff
Generating Sequences With Recurrent Neural Networks
[http://arxiv.org/abs/1308.0850](http://arxiv.org/abs/1308.0850)

Generating Text with Recurrent Neural Networks
[http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-
RNN.pdf](http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-RNN.pdf)

Bitcoin whitepaper
[https://bitcoin.org/bitcoin.pdf](https://bitcoin.org/bitcoin.pdf)

Ethereum paper [http://gavwood.com/Paper.pdf](http://gavwood.com/Paper.pdf)

------
norswap
Not an academic paper, but I found the Roslyn (new C# compiler) whitepaper to
be an interesting window into the future of programming languages:
[http://www.microsoft.com/en-
us/download/details.aspx?id=2774...](http://www.microsoft.com/en-
us/download/details.aspx?id=27744)

"Tackling the Awkward Squad: monadic input/output, concurrency, exceptions,
and foreign-language calls in Haskell" ([http://research.microsoft.com/en-
us/um/people/simonpj/papers...](http://research.microsoft.com/en-
us/um/people/simonpj/papers/marktoberdorf/mark.pdf)) finally made me
understand monads. Or rather, why they have such an unreasonable draw on
Haskell people. tl;dr: Monads are useful to thread data (state, side effects,
...) through a computation, without modifying all your function signatures
(the functions can be lifted to work with the monad). But mostly, it turns out
you NEED monads (or something like it) to sequence side-effects (since Haskell
is lazy).

------
PeCaN
Generalized LL Parsing -
[http://dotat.at/tmp/gll.pdf](http://dotat.at/tmp/gll.pdf)

Parse ambiguous context-free grammars in worst-case cubic time and unambiguous
grammars in linear time, with an intuitive recursive-descent-ish algorithm.
GLL is the future of parsing IMO, more powerful than packrat/PEG parsers and
comparatively easy to write by hand. It also handles ambiguities more
elegantly than GLR, IMO.

Dependency-Based Word Embeddings -
[https://levyomer.files.wordpress.com/2014/04/dependency-
base...](https://levyomer.files.wordpress.com/2014/04/dependency-based-word-
embeddings-acl-2014.pdf)

word2vec algorithm with context based on linguistic dependencies instead of a
skip-gram approach. A quick explanation is skip-grams give words related to
the embedding (ex: Hogwarts -> Dumbledore) and dependencies give words that
can be used like the embedding (ex: Hogwards -> Sunnydale). It's not meant to
replace skip-grams, but augment them; skip-gram contexts learn the domain and
dependency-based contexts learn the semantic type.

~~~
drostie
Thanks for the first of these, I've put it on my "eventually if I ever get
serious about writing this programming language" list.

~~~
PeCaN
Heh, that's where it is on mine too. :)

One of the things I find particularly nice about GLL is that it's much more
friendly to parser combinators[1] than GLR. (LR-family parsers, and bottom-up
parsing in general, is notoriously difficult to implement in a way such that
parsers can be combined, and the result framework would be rather awkward to
use.)

1: Indeed, it's already been done:
[https://github.com/bawerd/gll.js](https://github.com/bawerd/gll.js)
[https://github.com/epsil/gll](https://github.com/epsil/gll)
[https://github.com/djspiewak/gll-
combinators](https://github.com/djspiewak/gll-combinators)

------
mtrn
* Program design in the UNIX environment (1984): [http://harmful.cat-v.org/cat-v/unix_prog_design.pdf](http://harmful.cat-v.org/cat-v/unix_prog_design.pdf)

Glipse into unix essentials. Do one thing well on an example so you'll never
forget it again. Doing less requires more care and attention to detail.

* From Frequency to Meaning (2010): [https://www.jair.org/media/2934/live-2934-4846-jair.pdf](https://www.jair.org/media/2934/live-2934-4846-jair.pdf)

A nice summary on vector space models along with three basic matrix layouts:
term-document, word-context, pair-pattern and the resulting applications and
algorithms.

* A Roadmap towards Machine Intelligence (2015): [http://arxiv.org/pdf/1511.08130v1.pdf](http://arxiv.org/pdf/1511.08130v1.pdf)

Emphasis on communication. I liked the fact, that the AI is pictured as a
research assistant, since I would love to see a more dialog oriented
interaction with machines.

* 50 years of Data Science (2015): [https://dl.dropboxusercontent.com/u/23421017/50YearsDataScie...](https://dl.dropboxusercontent.com/u/23421017/50YearsDataScience.pdf)

Great essay on how the past had a handle on todays data analysis landscape,
just without the enormous computing power and data availability, that we have
today.

------
DanBC
National Confidential Inquiry into Suicide and Homicide by People with Mental
Illness (UK):
[http://www.bbmh.manchester.ac.uk/cmhs/research/centreforsuic...](http://www.bbmh.manchester.ac.uk/cmhs/research/centreforsuicideprevention/nci/)

That paper tells us that pain medication is often used in completed suicide
(paracetamol; paracetamol and opioids combined; and opioids; are three of the
top five most commonly used meds)

So I have an interest in pain medication from the angle of suicide prevention,
which is why these two are interesting.

Efficacy and safety of paracetamol for spinal pain and osteoarthritis:
systematic review and meta-analysis of randomised placebo controlled trials:
[http://www.bmj.com/content/350/bmj.h1225](http://www.bmj.com/content/350/bmj.h1225)

(Paracetamol probably doesn't help with long term musculo-skeletal pain, and
increases risk of liver damage)

[http://www.thelancet.com/journals/lancet/article/PIIS0140-67...](http://www.thelancet.com/journals/lancet/article/PIIS0140-6736\(14\)60805-9/abstract)

(Paracetamol probably no better than placebo for long term back pain)

~~~
MichaelGG
It's a bit confusing. For instance around page 80:

Table 5: Male suicide deaths and those aged 45-54 in the general population,
by UK country vs Table 7: Patient suicide: male suicide deaths and those aged
45-54, by UK country.

Table 5 shows the rate. Table 7 shows the actual numbers. Why? Even the first
key finding speculates about patient suicide increase due to higher numbers of
patients. Do they not have this seemingly important statistic? A quick search
says "a quarter" of the population will have a mental illness during the year.
If true, then we'd expect around 25% of suicides to be from patients, right?

Why separate the APAP/opiod combination in light of suicide if the APAP wasn't
a relevant cause? It seems like respiratory depression and liver poisoning
aren't that synergistic are they? An opiate naive user with 10/325 oxy/apap
would almost certainly hit opiate overdose before liver damage was a life-
threatening issue.

The study recommends "safe prescribing" but then shows the majority of opiate
suicide isn't with a prescription, and prescription overdose is skews heavily
to older females with a "major physical illness". And no comparison on how rx
abuse compares with non-mentally-ill patients. Edit: And rx rates, too. I'm
guessing older patients generally get way more opiates prescribed than younger
ones.

Interesting read though, thanks.

~~~
DanBC
These are great questions. They're normally pretty good at responding if you
want more information.

Here "patient" means "under the care of secondary MH services", so doesn't
include people who are being treated by their GP rather than by eg a community
MH team.

I think the opioid / APAP stuff is based on bots of history. Co-Proxamol was
for years the most common med used in completed suicide. It was put on more
restrictive prescribing, and use dropped. But then plain paracetamol use in
completed suicide increased. (And also attempted suicide, for a while
paracetamol overdose was 4% of UK liver transplants, (but 25% of the super-
urgent transplants)). Rules about paracetamol tightened, so we've seen
reductions in its use. So, from a public health POV, it's useful to see if
plain paracetamol, or the combination, or plain opioids are being used more
often, because that means they can look at what's driving sales or
prescriptions.

About safe prescribing: one source of medication used in completed suicide is
either from your own prescription, or from a relative's prescription. This is
often a preventable cause of death, so it's useful to see if safe prescribing
helps. It ties into things like "Triangle of Care" and also "Pills Project"
(which I want to try to use outside care homes).

You're right about older people. They also often don't lock up the meds in a
cupboard (they don't have children in the home anymore, they don't see a need)
and tragically grand-children come to visit and accidentally overdose.

[https://www.carers.org/triangle-care](https://www.carers.org/triangle-care)

[http://www.health.org.uk/pills](http://www.health.org.uk/pills)

------
mrdrozdov
A lot of NLP related papers. Here are a few of my favorites.

\- HMMs and Perceptrons for Part-of-Speech Tagging and Chunking -
[http://www.aclweb.org/anthology/W02-1001](http://www.aclweb.org/anthology/W02-1001)

\- MaxEnt for Part-of-Speech Tagging -
[http://www.aclweb.org/anthology/W96-0213](http://www.aclweb.org/anthology/W96-0213)

\- RNNs for Slot Filling -
[http://www.iro.umontreal.ca/~lisa/pointeurs/RNNSpokenLanguag...](http://www.iro.umontreal.ca/~lisa/pointeurs/RNNSpokenLanguage2013.pdf)

Not related to NLP, but I really like the Facebook paper that covered delta of
delta compression for time series data.

\-
[http://www.vldb.org/pvldb/vol8/p1816-teller.pdf](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf)

------
racoonear
Deep Residual Learning for Image Recognition
[http://arxiv.org/abs/1512.03385](http://arxiv.org/abs/1512.03385)

Batch Normalization
[http://jmlr.org/proceedings/papers/v37/ioffe15.pdf](http://jmlr.org/proceedings/papers/v37/ioffe15.pdf)

Deep Neural Decision Forests
[http://research.microsoft.com/pubs/255952/ICCV15_DeepNDF_mai...](http://research.microsoft.com/pubs/255952/ICCV15_DeepNDF_main.pdf)

Spatial Transformer Networks [https://papers.nips.cc/paper/5854-spatial-
transformer-networ...](https://papers.nips.cc/paper/5854-spatial-transformer-
networks)

------
web007
DagCoin : a bitcoin-like cryptocurrency with a "decentralized" blockchain
based on directed acyclic graphs -
[https://bitslog.files.wordpress.com/2015/09/dagcoin-v41.pdf](https://bitslog.files.wordpress.com/2015/09/dagcoin-v41.pdf)

Visual Search at Pinterest -
[http://arxiv.org/pdf/1505.07647v1.pdf](http://arxiv.org/pdf/1505.07647v1.pdf)

Fast Search in Hamming Space with Multi-Index Hashing -
[http://www.cs.toronto.edu/~norouzi/research/papers/multi_ind...](http://www.cs.toronto.edu/~norouzi/research/papers/multi_index_hashing.pdf)

------
temuze
One weird trick for parallelizing ConvNets:
[http://arxiv.org/abs/1404.5997](http://arxiv.org/abs/1404.5997)

Unsupervised Representation Learning with Deep Convolutional Generative
Adversarial Networks:
[http://arxiv.org/abs/1511.06434](http://arxiv.org/abs/1511.06434)

A Neural Network of Artistic Style:
[http://arxiv.org/abs/1508.06576](http://arxiv.org/abs/1508.06576)

------
tedyoung
Surprised nobody mentioned The Morning Paper yet:
[http://blog.acolyer.org/](http://blog.acolyer.org/)

------
anildigital
Comparison of Erlang Runtime System and Java Virtual Machine
[http://ds.cs.ut.ee/courses/course-
files/To303nis%20Pool%20.p...](http://ds.cs.ut.ee/courses/course-
files/To303nis%20Pool%20.pdf)

------
mrswag
I've read two rather old papers on different cross-validation techniques:

A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model
Selection (1995)
[http://robotics.stanford.edu/~ronnyk/accEst.pdf](http://robotics.stanford.edu/~ronnyk/accEst.pdf)

Improvements on Cross-Validation: The .632+ Bootstrap Method (1997)
[http://www.stat.washington.edu/courses/stat527/s13/readings/...](http://www.stat.washington.edu/courses/stat527/s13/readings/EfronTibshirani_JASA_1997.pdf)

And one on MIMO techniques:

V-BLAST: An Architecture for Realizing Very High Data Rates Over the Rich-
Scattering Wireless Channel (1998)
[http://www.ee.columbia.edu/~jiantan/E6909/wolnianskyandfosch...](http://www.ee.columbia.edu/~jiantan/E6909/wolnianskyandfoschini.pdf)

I find it to be a good way to get concise and accessible introductions (with
the associated results) to current practices.

------
serzh
This year I read a some cool papers:

Big Ball of Mud

Brian Foote and Joseph Yoder

About the reasons why good software become ugly and complex.

[http://www.laputan.org/mud/](http://www.laputan.org/mud/)

\----------

The Inevitable Pain of Software Development

Daniel M. Berry

About changes of requirements for the software.

[https://cs.uwaterloo.ca/~dberry/FTP_SITE/reprints.journals.c...](https://cs.uwaterloo.ca/~dberry/FTP_SITE/reprints.journals.conferences/tcre.painpaper.pdf)

\----------

No Silver Bullet

Frederick P. Brooks, Jr.

The software developing is in essense very complex.

[http://www.cs.nott.ac.uk/~pszcah/G51ISS/Documents/NoSilverBu...](http://www.cs.nott.ac.uk/~pszcah/G51ISS/Documents/NoSilverBullet.html)

\----------

Notes On Structured Programming

Edsger W. Dijkstra

Why we don't have to use _goto_.

[https://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF](https://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF)

\----------

Watermarking, tamper-proofing, and obfuscation - tools for software protection

Collberg, C.S. ; Dept. of Comput. Sci., Arizona Univ., Tucson, AZ, USA ;
Thomborson, C.

[http://dx.doi.org/10.1109/TSE.2002.1027797](http://dx.doi.org/10.1109/TSE.2002.1027797)

------
travjones
"Unified-theory-of-reinforcement neural networks do not simulate the blocking
effect"

Sci-hub link: [http://www.sciencedirect.com.sci-
hub.io/science/article/pii/...](http://www.sciencedirect.com.sci-
hub.io/science/article/pii/S0376635715300267)

------
genbit
What is a good resource for someone who want to read good papers from time to
time?

~~~
chrisseaton
Follow some academics in the field you are interested in on Twitter.

~~~
genbit
would you suggest from whom to start?

~~~
chrisseaton
Well what field are you interested in? I only know people in programming
languages and systems really.

------
georgerobinson
In search of an understandable consensus algorithm
([https://www.usenix.org/conference/atc14/technical-
sessions/p...](https://www.usenix.org/conference/atc14/technical-
sessions/presentation/ongaro))

SWIM: Scalable Weakly-consistent Infection-style Process Group Membership
Protocol
([http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf))

------
DyslexicAtheist
these "classics":

[http://blog.valbonne-consulting.com/2014/06/09/an-
incomplete...](http://blog.valbonne-consulting.com/2014/06/09/an-incomplete-
list-of-classic-papers-every-software-architect-should-read/)

------
ipunchghosts
"Development and Validation of a Biomarker for Diarrhea-Predominant Irritable
Bowel Syndrome in Human Subjects"

[http://journals.plos.org/plosone/article?id=10.1371/journal....](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0126438)

Large cohort study done determining what the biological mechanism are of IBS
(irritable bowel syndrome).

Defines a new test, IBSChek which can be used to determine if a patient has a
subtype of IBS. Anyone can get this test done now.

------
NotOscarWilde
Elaine Levey, Thomas Rothvoss - A Lasserre-based (1+ε)-approximation for
Pm∣pj=1,prec∣Cmax
[http://arxiv.org/abs/1509.07808](http://arxiv.org/abs/1509.07808)

People are very excited about graph isomorphism being solvable in
quasipolynomial time, but there are a few more problems from the seminal
Garey, Johnson book that are still unknown to be in either P or NP-c or
neither. One of them is computing the optimal schedule for three machines
processing some tasks (jobs), when the tasks have all the same size, but there
are dependencies among some of them and you have to do them in order.

This paper proves that there is a (1+ε)-approximation of this problem in
"slightly more than quasipolynomial time" (I love this phrasing).

The technique they use is a Lasserre hierarchy which is a very exciting tool
in theoretical computer science, although there still exist only a couple
results where this hierarchy approach brings more to the table than other
methods for designing efficient algorithms. This is one more to the list!

------
bra-ket
memory networks:
[http://www.thespermwhale.com/jaseweston/ram/](http://www.thespermwhale.com/jaseweston/ram/)

~~~
versteegen
Thanks for linking this! But I easily overlooked your post. It links to
"Reasoning, Attention, Memory (RAM) NIPS Workshop 2015"

------
hedgehog
Using state lattices for motion planning was new to me but seems like an
elegant approach:

[http://people.csail.mit.edu/rak/www/sites/default/files/pubs...](http://people.csail.mit.edu/rak/www/sites/default/files/pubs/PivKneKel09.pdf)

------
marcodena
Spotify – Large Scale, Low Latency, P2P Music-on-Demand Streaming
[http://www3.cs.stonybrook.edu/~phillipa/CSE390/spotify-p2p10...](http://www3.cs.stonybrook.edu/~phillipa/CSE390/spotify-p2p10.pdf)

and many others but this is the one I liked the most

~~~
jvandonsel
That Spotify paper was undated. When was it written?

~~~
freyr
Presented August 2010 at the IEEE Conference on Peer-to-Peer Computing (P2P)

------
chaoxu
Here is an accessible algorithms paper. It's a cute puzzle problem. It was
inspired by answers on cs.stackexchange.

Efficient Algorithms for Envy-Free Stick Division With Fewest Cuts
[http://arxiv.org/abs/1502.04048](http://arxiv.org/abs/1502.04048)

------
afancy
Benchmarking Smart Meter Data Analytics
[http://openproceedings.org/2015/conf/edbt/paper-55.pdf](http://openproceedings.org/2015/conf/edbt/paper-55.pdf)

------
binarymax
A couple years behind the times but I got really into word2vec and plenty of
associated works. On a mobile so not easy to post links, but if you haven't
checked out w2v I highly recommend it.

------
throwawaykf05
Not a specific list of papers, but I find Sigcomm to generally have very good
papers in the field of networking and communications. Here's the link for this
year's conference:

[http://conferences.sigcomm.org/sigcomm/2015/program.php](http://conferences.sigcomm.org/sigcomm/2015/program.php)

------
roninb
It didn't get published this year but I though Robust De-Anonymization of
Large Datasets -
[http://arxiv.org/pdf/cs/0610105.pdf](http://arxiv.org/pdf/cs/0610105.pdf)

------
j_juggernaut
Wow. Didn't realize how HOT deep learning was in 2015.

