
LSTM trained on HN - enkiv2
https://medium.com/@hondanhon/i-trained-an-lstm-neural-net-to-generate-hacker-news-submissions-9213e1225208
======
minimaxir
As someone who's cited by the OP _and_ coincidentally released a tool today
that easily trains a LSTM on any dataset _demoed_ with Hacker News data
([https://github.com/minimaxir/textgenrnn](https://github.com/minimaxir/textgenrnn)),
I have a couple comments:

1) My Get All Hacker News Submissions script is somewhat obsolete since all HN
data is now on BigQuery (will add a note to the README today)

2) There is a massive, _massive_ selection bias in terms of the quality of the
selected generated texts. If you look at the sample output
([https://github.com/danhon/deep-
hackernews/blob/master/sample...](https://github.com/danhon/deep-
hackernews/blob/master/sample-output/hn-titles-rsize512-temp0.45-1e6-1.txt)),
there is a large disparity in quality between the chosen submissions for the
blog post and a typical generated submission. (this is expected, although the
need for human curation damages the "Turing test" suggestions proposed by
other commenters)

Compare with the sample output at similar temperature from my 128-cell LSTM
network
([https://github.com/minimaxir/textgenrnn/blob/master/outputs/...](https://github.com/minimaxir/textgenrnn/blob/master/outputs/hacker_news_temp_0_5.txt)),
which is several orders smaller (OP uses a 512-cell network), and _many
magnitudes lower_ amount of training data. (I am curious how _long_ OP spent
training the network.)

------
woliveirajr
Next step: someone will build a mix of real headlines and LSTM-generated ones
and will post it as a challenge, where a perfect 20 out of 20 will never be
conquered.

~~~
gallerdude
A new sort of Turing Test...

------
dgreensp
_Brain drivers are reality as a service_

Some pretty hilarious and/or thought-provoking headlines! LSTMs capture so
much about the input (compared to N-grams, say), it can be almost eerie.

~~~
noiv
I scanned the list and attached a subjective clickbaitiness to each
'headline'. Of course without reading the article. Yours got 100%.

I learned two things, this network knows how to make HN visitors click. It is
hard to extract knowledge from a NN.

------
udsloiwdaa
>Show HN: Universal Basic Income in Elixir

fucking lost

------
Talyen42
My favorites:

# I like Self-Driving Cars Using Docker Components

# Elon Musk says he will be more than a secret artificial intelligence

# Show HN: A web browser for harassment in 2017

# Slack is the future of the world

# China has died

~~~
andybak
"Technology reveals the best way to stop using Rust" won for me.

~~~
reitanqild
Some of my favourites:

The Man Who Continues to Code (2013)

How to Start a Basic Income as a Service

Ask HN: What is the best way to sell the blockchain in 2017?

Self-Driving Cars Using Docker Components

⏱ Show HN: A command-line tool for creating a startup in 100 mins

------
ak_yo
How does this compare to 2-gram Markov chains? The results look pretty similar
(though perhaps a bit more grammatical...)

------
tyingq
I wonder if it uses words like "orthogonal" and "bikeshedding" more than it
should.

------
rcarmo
I thought " The State of the Post Neural Network Attack" was both pretty meta
and superbly on point.

But yes, reading through the titles was eerily realistic.

------
AdmiralAsshat
> Ask HN: What do you use for developers?

Brilliant.

