Don’t expect accurate answers here - it’s an experimental weekend hack.
With this experiment, we wanted to understand how we could train a neural network on new external datasets very quickly and study the difference in the language and answers depending on the topic of the dataset.
No scripted chatbot here, we are talking of neural-nets trained in the wild :) We are using the celebrated seq-2-seq model which computes a “thought-vector” from an input sentence and generates an output sentence conditioned on this vector. We gathered various datasets from stackexchange and launched a big overnight training of our models to have some surreal morning coffee talks with our AI.
There’s still a ton of work for the answers to start to make sense (longer training and bigger datasets would improve the quality of course, we can also easily add components to improve the variety and coherence of the responses) but a difference can definitely be noticed based on the dataset subset.
This will allow us to test a variety of new datasets way faster than before.
Let us know what you think, and if you think of a cool dataset to test, just let us know on Github.
me: yes
gg: I'll send screen shots
....
