Hacker News new | comments | ask | show | jobs | submit login

What I'm surprised most is that the headlines seem not to be much better than your average markov chain output



I think this is for three main reasons:

1. You can do really well with a simple grammar

2. You only need short output

3. Lack of training data

There's not an incredibly rich structure to extract, and with short outputs the weirdness doesn't compound and cycles aren't as likely. A common small dataset for playing with RNNs is all of Shakespeare which is somewhere in the region of 1M words.

However, this is still fun and interesting!


> 3. Lack of training data

> [...]

> There's not an incredibly rich structure to extract, and with short outputs the weirdness doesn't compound and cycles aren't as likely. A common small dataset for playing with RNNs is all of Shakespeare which is somewhere in the region of 1M words.

He does state that the network is trained with 2M headlines, meaning ~5-20M words. That should be enough.

I would have thought that RNN would somehow work better. It would be interesting to see direct comparison of fake hacker news headlines generated with Markov chains versus RNN.


True, I had managed to miss that, although it's working on 200 dimensional vectors rather than single letters as in the small shakespeare dataset. That feels like it might make it harder to train. I've personally found more problems dealing with Glove vectors compared to the word2vec ones, but I don't have any hard data for that.





Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: