The link to Snorkel [1] is really interesting, labeling data in a low quality programmatic way, which is then fed through a neural network to produce high quality labels is really smart.
Yes, Snorkel and DeepDive look extremely useful. At my job we have a lot of data but it's unlabeled, it will cost millions of dollars to outsource it to India for labeling/data entry.
This is great. In some ways it reminds me of the recent "Software 2.0" posts around here -- make the code and architecting so easy that we begin teaching machines by creating data rather than writing code.
I like many of these ideas, they address real practical problems in the area and new research is always welcome. How this all will be packaged into a working environment is not clear to me but even the individual parts should be useful.
Matei Zaharia (one of the PIs on DAWN) here. Snorkel, MacroBase and ASAP are already being used in production at several companies, and we intend to continue publishing everything as open source. We only started this lab a year ago, so a lot of the projects listed are still new.
Finding the "nearest matching pattern" is part of just about ANY pattern matching. The devil-of-the-detail is dealing with noise, precision-loss-for-speed, generalization (compression), tuning, etc. This attempts to break such down into staff-digestible chunks.