
Beermind – RNN generates beer reviews from category - Homunculiheaded
http://deepx.ucsd.edu/#/home/beermind
======
zackchase
Hi, thanks for posting our work! The system generates reviews at the character
level based on both category and star rating.

~~~
IanCal
First off, this was really interesting and fun to play with. What's the tech
setup for this? What are you running the model in on the backend?

\----

Just a note on the site, if you block analytics the whole page fails to load,
not sure if that's intentional or not.

dist/angulartics-google-analytics.min.js

and

[http://www.google-analytics.com/analytics.js](http://www.google-
analytics.com/analytics.js)

\----

In your paper, you have a graph which relies on colour to tell the difference
between the lines, here's what it looks like to someone with protanopia:
[http://i.imgur.com/HclRJYN.png](http://i.imgur.com/HclRJYN.png)

In general, it's worth trying to make your paper understandable if it's in
black and white. That doesn't mean you can't use colour to help, but _relying_
on colour means you're excluding quite a large group of people (~5% of men)
from understanding your work.

~~~
zackchase
Hi Ian,

First off, thanks for your interest in our work! The tech is a homegrown
recurrent neural network (deepx) which is available via pip.
[https://pypi.python.org/pypi/deepx/](https://pypi.python.org/pypi/deepx/) We
use theano for compilation to GPU and an original neural network architecture
(a concatenated input generative model) to preserve the signal of auxiliary
inputs (like star rating and beer category) across long intervals.

To run the model in reverse, we infer the probability of a category via the
likelihood of a review. Because the prior over the categories is uniform
(balanced dataset) and because the normalization term (marginal probability of
a review) doesn't depend on the category, we can exploit the fact that the
probability of a beer category given a review is proportional to the
probability of a review given a category. In this way we're able to make a
text classifier that takes into account word order.

Regarding Google Analytics, we just noticed this ourselves and are fixing it
presently. Concerning color, that's a great point and I apologize for the
oversight. We certainly bear no malice towards those with protanopia. In
future work (and if we have subsequent versions of this paper) we'll do our
best to make the lines dotted, dashed, etc to make them more readable to those
unable to distinguish reg-green. Generally, I agree with you that a black and
white printing should retain all readability.

Thanks for your interest and helpful observations.

Cheers,

Zack

------
smackfu
Two of the categories are interesting choices:

Fruit / Vegetable beer: this is kind of a catch-all category, so I wouldn't be
surprised if the data was a bit of a mess. Like you can have pumpkin beers
mixed in with strawberry beers.

American Adjunct Lager: this is the category used for American junk beer like
Bud Light. I bet most of the reviews are very low rated.

~~~
zackchase
Hi smackfu. So one wonderful virtue of this technology is that it can handle
messes quite gracefully. Part of the fun is to see that it can learn to stay
on topic - to decide it's talking about a raspberry beer and continue to talk
about rapberries. As far as fruit vegetable beer being a catch-all category,
that's only within fruit beers. They're certainly distinct from IPAs, stouts,
porters and lagers. Interstingly, for classification, far the hardest
categories to disambiguate are Porters and Stouts (probably because stouts
really are "stout porters").

Our goal was to pick a few categories would allow us to evaluate the
capabilities of the model. Thus having some that are crystal clear and others
heterogeneous was appropriate. Regarding American Adjunct Lagers being low
rated, yes the lagers are frequently described as "piss", "watery", and
"urine".

------
larsga
Most of the ratings are absurd and self-contradictory, but occasionally you
get one that sounds right. I've downloaded the paper, but haven't read it yet.
To me the reviews looks pretty much like what you'd get if you built a Markov
chain from the BeerAdvocate database. Is this really better in any way?

~~~
zackchase
Hi larsga. Thanks for your interest. To begin, yes the concatenated input RNN
is much better. A Markov chain could not do this feat (or make comprehensible
text at the character level generally). Occasionally the reviews contradict
themselves, but they're actually remarkably consistent regarding the
conditioned upon attributes. I'd suggest you pay attention to the
"temperature". This is a parameter that determines how stochastic the
generation is. With low temperatures the reviews are less varied but "make
more sense". With higher temperature they are more entropic.

When you tell it to make an IPA review it stays on topic and talks about hoppy
flavor. About stouts is consistently calls them black, with hints of chocolate
(not to mention using the word "stout").

Here's an example of a review I just generated for "Fruit/Vegetable beer":

"This brew pours a very clear golden color. The finger head is pretty small
and fizzy and has a slightly pink color. The smell is really nice. The taste
is fruity and sweet, but not overwhelming. The flavor is a little weak and is
a bit sweeter than most beers but still very nice. I could drink this all day,
but I would probably prefer the fruit beer to be a bit more pronounced. This
beer is actually quite smooth and inviting. It has a strong taste of
raspberries but is complimented by a nice tartness that comes through as well.
The mouthfeel is smooth and creamy with a dry finish. This is a very drinkable
beer and I could see myself enjoying to try this one again."

Clearly the RNN learns to form words like "fruity" and "sweet" and
"raspberries" to describe a fruit beer. It also says the flavor is a little
"weak" and in the next sentence says it would prefer for the taste to be "more
pronounced". Keep in mind, this neural network was given no a priori notion of
words. A Markov chain cannot produce even remotely similar conditional text at
the character level.

For proof that it learns to differentiate the different types of beer, we
demonstrate in the paper that the model can be run as a classifier and
classify the type of beer form the review with 90% accuracy (on previously
unseen test data). This is almost comparable to state of the art logistic
regression tf-idf ngram model, despite the fact that we haven't even tuned the
model especially carefully to be a classifier (with regularization or hyper-
parameter search , for instance.

Here's another example (for an IPA):

"This is a fine IPA for sure, but not a beer I would love to drink a lot of.
This is one of the better IPA's I have ever had. I can see why the beer is
unlike any IPA should be and the best beer I've ever had. I could drink this
all night, but it is very drinkable. I could easily drink a few of these
without a problem. I don't know what the malt base but it is so faint and it
is pretty tasty. I can see why the beer is a great hop bomb and the flavors
are both subtle and superb. It's a great balance and can be a good supply of
the style. I like it, but the hops are a bit off the more I drink them, but
the hops are very pronounced. The finish is a little bitter, and the hop
flavors are great."

I'm not sure what else to say, if you don't believe that the net has learned
to distinguish an IPA. Of course, it does contradict itself on sentiment. But
with lower temperatures even this is not so common. It can also be addressed
by setting extreme star ratings. (we can actually put in a 0 star rating, or
as high as a 10 star rating to induce a review of more extreme sentiment).

A comedic point that must be made here is that the source material (the
reviews from BeerAdvocate) are themselves absurd, and occasionally
contradictory. They English they contain is ungrammatical, ridiculous, and
frequently misspelled. Nevertheless it's a fascinating dataset on account of
how well-annotated and dense it is (the 190-core has over 250k reviews).

~~~
larsga
The part I missed is that you're doing this at the character level, and not at
the word level. If you were doing this at the word level a Markov chain could
easily tell an IPA from a porter. But at the character level it suddenly
becomes a _lot_ more impressive. Thank you! I'll read the paper now.

