
Extracting Structured Data from Recipes Using Conditional Random Fields - aaronbrethorst
http://open.blogs.nytimes.com/2015/04/09/extracting-structured-data-from-recipes-using-conditional-random-fields/?_r=0
======
tylerpachal
Could anyone go into more detail on why a CRF is a good model for this kind of
task? In the article they say that is was good for similar tasks:

> We chose to use a discriminative structured prediction model called a
> linear-chain conditional random field (CRF), which has been successful on
> similar tasks such as part-of-speech tagging and named entity recognition.

But they don't say why they chose if over some sort of markov model (chain,
hidden, etc).

~~~
Dn_Ab
All the things you mentioned (plus e.g. bayesian networks and Restricted
Boltzmann Machines) are examples of Graphical Models. You can roughly think of
(linear chain) CRFs as being to HMMs as logistic regression is to Naive Bayes.
HMMs and Naive bayes learn a joint probability distribution on the data while
Log Reg and CRFs fit conditional probabilities.

If none of that makes sense then, basically, in general and with more data,
the CRF (or discriminative classifier) will tend to make better predictors
because they don't try to _directly_ model complicated things that don't
really matter for prediction anyways. Because of this they can use richer
features without having to worry about how such and such relates to this or
that. All this ends up making discriminative classifiers more robust when
model assumptions are violated because they don't sacrifice as much to remain
tractable (or rather, the trade off/sacrifice they make tends to end up not
mattering as much when prediction accuracy is your main concern).

So in short, you use a HMM instead of a Markov Chain when the sequence you're
trying to predict is not visible. Like say when you want to predict the parts
of speech but only have access to words, you'll use the relationship between
the visible sequence of words to learn the hidden sequence of Parts of speech
labels. You use CRFs instead of HMMs because they tend to make better
predictors while remaining tractable. The downside is discriminative
classifiers will not necessarily learn the most meaningful decision
boundaries, this starts to matter when you want to move beyond just
prediction.

------
macNchz
Very cool solution to something that can be really painful. I built a recipe
catalog mobile app a few years ago that incorporated a 'smart' shopping cart
that would sum up ingredients from different recipes, like '2 hard boiled
eggs' and '1 egg yolk' to show them in your cart as '3 eggs'. We had only a
hundred-odd recipes so this became a manual process (including entering the
structured data in a crappy CMS much like the one described in the article),
but I felt the whole time that this was a problem aching for some sort of
great solution.

~~~
lqdc13
Definitely. A lot of problems can benefit from sequence labeling. I did a
similar thing with labeling malware names[1] and it worked really well with
only about a hundred examples for each type of sequence.

[1][http://lqdc.github.io/using-machine-learning-to-name-
malware...](http://lqdc.github.io/using-machine-learning-to-name-malware.html)

------
bglazer
Previous discussion here:

[https://news.ycombinator.com/item?id=9349918](https://news.ycombinator.com/item?id=9349918)

------
lqdc13
I will buy a beer to anyone who implements a good fast 2nd order linear-chain
CRF in an imperative lang like C++/Julia/Cython/Python+Numba. Many many models
would benefit from higher order information.

My current implementation is slowish and the fast ones like crfsuite are only
1st order. FlexCRF is buggy. There are literally no good 2nd order CRF libs.

