
Python code to generate text using a pretrained character-based RNN - jjwiseman
https://github.com/minimaxir/textgenrnn
======
popcorncolonel
Very simple interface, but even the cherry-picked examples in the readme are
quite poor in terms of meaning/grammatical correctness. It is even using
pretty state-of-the-art techniques such as LSTMs and word embeddings, but the
results just aren't there.

Can someone point me to some of the best examples of coherent and long (as in
many words/sentences) automatic text generation? Or an explanation as to why
we're so far from being able to tackle this problem? Because I haven't seen
anything noteworthy in text generation to write home about.

~~~
minimaxir
Repo author here:

Unfortunately, a _lot_ of the examples people give for text generation in
general are cherrypicked, which leads to a selection bias toward the
robustness of text generation (see my previous rant:
[https://news.ycombinator.com/item?id=14949220](https://news.ycombinator.com/item?id=14949220))

In terms of size vs. performance (including retraining), the 128-cell LSTM was
the best balance for a repo like this. (The 3-layer 512-cell networks used in
the original Karpathy examples are hundreds of MB)

I recommend looking at the examples in the /output folder for more robust
examples than the ones in the README.

~~~
backpropaganda
You could make the bigger model available elsewhere, and allow the user to
download the bigger model if they want to.

Why don't you have a priming ability? That is, generating starting from some
context text. Might be useful for a lot of applications.

~~~
minimaxir
As noted in the README, supporting bigger models is an option for future
development. (I still would need to train/optimize the model which does take
time/money. Additionally, file hosting is not free and I am not making any
revenue off of this project currently)

The generate() function does have priming with a `prefix` parameter; see the
demo.

------
smtpserver
Could this be used to generate place names in like this
[https://medium.com/@hondanhon/i-trained-a-neural-net-to-
gene...](https://medium.com/@hondanhon/i-trained-a-neural-net-to-generate-
british-placenames-9460e907e4e9) but with a different language? Or would that
need a different training set?

~~~
minimaxir
The pretrained network is calibrated for English, but if you trained it on
non-English data it should still work fine since the entire network is
retrained (assuming there is enough overlap with characters in the vocabulary;
e.g. Spanish would work fine but CJK languages would not).

~~~
smtpserver
Thanks! I love the textgenrnn_vocab.json easter egg by the way :)

