Hacker News new | past | comments | ask | show | jobs | submit login

Can you provide articles comparing CRFs directly with LSTMs? Most articles on LSTMs don't actually compare against CRFs and an LSTM isn't a drop in replacement for a CRF. I haven't personally seen that neural networks have uniformly beaten CRFs on all tasks. E.g. [2] directly compares CRFs and an LSTM and the CRF achieves an F1 of 97.533 while the LSTM gets 97.848.

In fact, because of the competitiveness of CRFs there are many works that combine them with neural networks (e.g. [2])

[1] https://arxiv.org/pdf/1606.03475.pdf

[2] https://arxiv.org/abs/1508.01991




tensor: my main point was and is that features learned by a suitable deep model (whether recurrent or attention-based) routinely outperform human-designed features. This has been shown in a large and growing number of sequence tasks (WMT language translation datasets, Stanford Question Answering Dataset, WikiText language modeling datasets, Penn Treebank dataset, IMDB and Stanford Sentiment Treebank movie review datasets, etc. -- to name a few).

Now, in some cases, and depending on the task, it might make sense to have the last layer of a deep model be a CRF layer. In the OP's case, for example, one could try replacing all those one-off feature functions with a proven deep architecture -- in other words, instead of having ψ at each time step be equal to exp(sum(weighted feature functions))), have it be a function of the output of the deep model.

That said, for something like the OP's task, the first thing I would try would be one of the readily available LSTM architectures[a], with a standard softmax layer predicting a distribution over the vocabulary of tags at each time step, and feeding that into a standard beam search.[b]

[a] Example: https://github.com/salesforce/awd-lstm-lm/blob/master/model....

[b] Intro to beam search algorithm: https://www.youtube.com/watch?v=UXW6Cs82UKo




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: