Hacker News new | past | comments | ask | show | jobs | submit login
DAWN: Tools for AI and Data Product Development (stanford.edu)
215 points by indescions_2017 on Dec 20, 2017 | hide | past | web | favorite | 14 comments

The link to Snorkel [1] is really interesting, labeling data in a low quality programmatic way, which is then fed through a neural network to produce high quality labels is really smart.

[1] https://github.com/HazyResearch/snorkel

Yes, Snorkel and DeepDive look extremely useful. At my job we have a lot of data but it's unlabeled, it will cost millions of dollars to outsource it to India for labeling/data entry.

This is great. In some ways it reminds me of the recent "Software 2.0" posts around here -- make the code and architecting so easy that we begin teaching machines by creating data rather than writing code.

A lot of research has been done in that direction.


I always thought the Wolfram language was a good step toward "Software 2.0". It's a shame the language isn't open source.

I like many of these ideas, they address real practical problems in the area and new research is always welcome. How this all will be packaged into a working environment is not clear to me but even the individual parts should be useful.

So anyone using these tools in live/production environment?

Matei Zaharia (one of the PIs on DAWN) here. Snorkel, MacroBase and ASAP are already being used in production at several companies, and we intend to continue publishing everything as open source. We only started this lab a year ago, so a lot of the projects listed are still new.

I am trying to see any sample projects to learn, if its possible pls share

Would be interesting to hear any experiences. These researchers have background in Spark etc. so setting up might not be that difficult.

Seems it has similar goals to the idea behind factor tables: https://github.com/RowColz/AI

I'm pretty sure that guy just reinvented nearest neighbor.

Finding the "nearest matching pattern" is part of just about ANY pattern matching. The devil-of-the-detail is dealing with noise, precision-loss-for-speed, generalization (compression), tuning, etc. This attempts to break such down into staff-digestible chunks.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact