Generating Sequences With Recurrent Neural Networks - http://arxiv.org/abs/1308.0850
Older one, but important to understand deeply since other recent ideas have come from this!
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks - http://arxiv.org/abs/1511.06434
Unitary Evolution Recurrent Neural Networks - http://arxiv.org/abs/1511.06464
State of the Art Control of Atari Games Using Shallow Reinforcement Learning - http://arxiv.org/abs/1512.01563
Interesting discussion in section 6.1 on the shortcomings/issues of DQN done by Deepmind
Spectral Representations for Convolutional Neural Networks - http://arxiv.org/abs/1506.03767
Deep Residual Learning for Image Recognition - http://arxiv.org/abs/1512.03385
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) - http://arxiv.org/abs/1511.07289
I wish they did more comparisons between similar network architecture with only the units swapped out. Eg. Alexnet, Relu vs Alexnet, Elu.
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models - http://arxiv.org/abs/1511.09249
Just a few from my list :)
Deep Residual Learning for Image Recognition
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
Now, a good paper I will read and grok the main importance in a few hours to the point I can implement the basics. But, a classic paper will be like a well-thumbed classic and might take years to fully grasp.
Generating Text with Recurrent Neural Networks
"Tackling the Awkward Squad:
monadic input/output, concurrency, exceptions, and
foreign-language calls in Haskell" (http://research.microsoft.com/en-us/um/people/simonpj/papers...) finally made me understand monads. Or rather, why they have such an unreasonable draw on Haskell people. tl;dr: Monads are useful to thread data (state, side effects, ...) through a computation, without modifying all your function signatures (the functions can be lifted to work with the monad). But mostly, it turns out you NEED monads (or something like it) to sequence side-effects (since Haskell is lazy).
Parse ambiguous context-free grammars in worst-case cubic time and unambiguous grammars in linear time, with an intuitive recursive-descent-ish algorithm. GLL is the future of parsing IMO, more powerful than packrat/PEG parsers and comparatively easy to write by hand. It also handles ambiguities more elegantly than GLR, IMO.
Dependency-Based Word Embeddings - https://levyomer.files.wordpress.com/2014/04/dependency-base...
word2vec algorithm with context based on linguistic dependencies instead of a skip-gram approach. A quick explanation is skip-grams give words related to the embedding (ex: Hogwarts -> Dumbledore) and dependencies give words that can be used like the embedding (ex: Hogwards -> Sunnydale). It's not meant to replace skip-grams, but augment them; skip-gram contexts learn the domain and dependency-based contexts learn the semantic type.
One of the things I find particularly nice about GLL is that it's much more friendly to parser combinators than GLR. (LR-family parsers, and bottom-up parsing in general, is notoriously difficult to implement in a way such that parsers can be combined, and the result framework would be rather awkward to use.)
1: Indeed, it's already been done:
Glipse into unix essentials. Do one thing well on an example so you'll never forget it again. Doing less requires more care and attention to detail.
* From Frequency to Meaning (2010): https://www.jair.org/media/2934/live-2934-4846-jair.pdf
A nice summary on vector space models along with three basic matrix layouts: term-document, word-context, pair-pattern and the resulting applications and algorithms.
* A Roadmap towards Machine Intelligence (2015): http://arxiv.org/pdf/1511.08130v1.pdf
Emphasis on communication. I liked the fact, that the AI is pictured as a research assistant, since I would love to see a more dialog oriented interaction with machines.
* 50 years of Data Science (2015): https://dl.dropboxusercontent.com/u/23421017/50YearsDataScie...
Great essay on how the past had a handle on todays data analysis landscape, just without the enormous computing power and data availability, that we have today.
That paper tells us that pain medication is often used in completed suicide (paracetamol; paracetamol and opioids combined; and opioids; are three of the top five most commonly used meds)
So I have an interest in pain medication from the angle of suicide prevention, which is why these two are interesting.
Efficacy and safety of paracetamol for spinal pain and osteoarthritis: systematic review and meta-analysis of randomised placebo controlled trials: http://www.bmj.com/content/350/bmj.h1225
(Paracetamol probably doesn't help with long term musculo-skeletal pain, and increases risk of liver damage)
(Paracetamol probably no better than placebo for long term back pain)
Table 5: Male suicide deaths and those aged 45-54 in the general population, by UK country vs Table 7: Patient suicide: male suicide deaths and those aged 45-54, by UK country.
Table 5 shows the rate. Table 7 shows the actual numbers. Why? Even the first key finding speculates about patient suicide increase due to higher numbers of patients. Do they not have this seemingly important statistic? A quick search says "a quarter" of the population will have a mental illness during the year. If true, then we'd expect around 25% of suicides to be from patients, right?
Why separate the APAP/opiod combination in light of suicide if the APAP wasn't a relevant cause? It seems like respiratory depression and liver poisoning aren't that synergistic are they? An opiate naive user with 10/325 oxy/apap would almost certainly hit opiate overdose before liver damage was a life-threatening issue.
The study recommends "safe prescribing" but then shows the majority of opiate suicide isn't with a prescription, and prescription overdose is skews heavily to older females with a "major physical illness". And no comparison on how rx abuse compares with non-mentally-ill patients. Edit: And rx rates, too. I'm guessing older patients generally get way more opiates prescribed than younger ones.
Interesting read though, thanks.
Here "patient" means "under the care of secondary MH services", so doesn't include people who are being treated by their GP rather than by eg a community MH team.
I think the opioid / APAP stuff is based on bots of history. Co-Proxamol was for years the most common med used in completed suicide. It was put on more restrictive prescribing, and use dropped. But then plain paracetamol use in completed suicide increased. (And also attempted suicide, for a while paracetamol overdose was 4% of UK liver transplants, (but 25% of the super-urgent transplants)). Rules about paracetamol tightened, so we've seen reductions in its use. So, from a public health POV, it's useful to see if plain paracetamol, or the combination, or plain opioids are being used more often, because that means they can look at what's driving sales or prescriptions.
About safe prescribing: one source of medication used in completed suicide is either from your own prescription, or from a relative's prescription. This is often a preventable cause of death, so it's useful to see if safe prescribing helps. It ties into things like "Triangle of Care" and also "Pills Project" (which I want to try to use outside care homes).
You're right about older people. They also often don't lock up the meds in a cupboard (they don't have children in the home anymore, they don't see a need) and tragically grand-children come to visit and accidentally overdose.
- HMMs and Perceptrons for Part-of-Speech Tagging and Chunking - http://www.aclweb.org/anthology/W02-1001
- MaxEnt for Part-of-Speech Tagging - http://www.aclweb.org/anthology/W96-0213
- RNNs for Slot Filling - http://www.iro.umontreal.ca/~lisa/pointeurs/RNNSpokenLanguag...
Not related to NLP, but I really like the Facebook paper that covered delta of delta compression for time series data.
Batch Normalization http://jmlr.org/proceedings/papers/v37/ioffe15.pdf
Deep Neural Decision Forests http://research.microsoft.com/pubs/255952/ICCV15_DeepNDF_mai...
Spatial Transformer Networks https://papers.nips.cc/paper/5854-spatial-transformer-networ...
Visual Search at Pinterest - http://arxiv.org/pdf/1505.07647v1.pdf
Fast Search in Hamming Space with Multi-Index Hashing - http://www.cs.toronto.edu/~norouzi/research/papers/multi_ind...
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks:
A Neural Network of Artistic Style:
A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selection (1995) http://robotics.stanford.edu/~ronnyk/accEst.pdf
Improvements on Cross-Validation: The .632+ Bootstrap Method (1997) http://www.stat.washington.edu/courses/stat527/s13/readings/...
And one on MIMO techniques:
V-BLAST: An Architecture for Realizing Very High Data Rates Over the Rich-Scattering Wireless Channel (1998) http://www.ee.columbia.edu/~jiantan/E6909/wolnianskyandfosch...
I find it to be a good way to get concise and accessible introductions (with the associated results) to current practices.
Big Ball of Mud
Brian Foote and Joseph Yoder
About the reasons why good software become ugly and complex.
The Inevitable Pain of Software Development
Daniel M. Berry
About changes of requirements for the software.
No Silver Bullet
Frederick P. Brooks, Jr.
The software developing is in essense very complex.
Notes On Structured Programming
Edsger W. Dijkstra
Why we don't have to use goto.
Watermarking, tamper-proofing, and obfuscation - tools for software protection
Collberg, C.S. ; Dept. of Comput. Sci., Arizona Univ., Tucson, AZ, USA ; Thomborson, C.
Sci-hub link: http://www.sciencedirect.com.sci-hub.io/science/article/pii/...
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol (http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf)
Large cohort study done determining what the biological mechanism are of IBS (irritable bowel syndrome).
Defines a new test, IBSChek which can be used to determine if a patient has a subtype of IBS. Anyone can get this test done now.
People are very excited about graph isomorphism being solvable in quasipolynomial time, but there are a few more problems from the seminal Garey, Johnson book that are still unknown to be in either P or NP-c or neither. One of them is computing the optimal schedule for three machines processing some tasks (jobs), when the tasks have all the same size, but there are dependencies among some of them and you have to do them in order.
This paper proves that there is a (1+ε)-approximation of this problem in "slightly more than quasipolynomial time" (I love this phrasing).
The technique they use is a Lasserre hierarchy which is a very exciting tool in theoretical computer science, although there still exist only a couple results where this hierarchy approach brings more to the table than other methods for designing efficient algorithms. This is one more to the list!
and many others but this is the one I liked the most
Efficient Algorithms for Envy-Free Stick Division With Fewest Cuts