Beermind – RNN generates beer reviews from category

zackchase · on Nov 23, 2015

Hi, thanks for posting our work! The system generates reviews at the character level based on both category and star rating.

IanCal · on Nov 23, 2015

First off, this was really interesting and fun to play with. What's the tech setup for this? What are you running the model in on the backend?

----

Just a note on the site, if you block analytics the whole page fails to load, not sure if that's intentional or not.

dist/angulartics-google-analytics.min.js

and

http://www.google-analytics.com/analytics.js

----

In your paper, you have a graph which relies on colour to tell the difference between the lines, here's what it looks like to someone with protanopia: http://i.imgur.com/HclRJYN.png

In general, it's worth trying to make your paper understandable if it's in black and white. That doesn't mean you can't use colour to help, but relying on colour means you're excluding quite a large group of people (~5% of men) from understanding your work.

zackchase · on Nov 23, 2015

Hi Ian,

First off, thanks for your interest in our work! The tech is a homegrown recurrent neural network (deepx) which is available via pip. https://pypi.python.org/pypi/deepx/ We use theano for compilation to GPU and an original neural network architecture (a concatenated input generative model) to preserve the signal of auxiliary inputs (like star rating and beer category) across long intervals.

To run the model in reverse, we infer the probability of a category via the likelihood of a review. Because the prior over the categories is uniform (balanced dataset) and because the normalization term (marginal probability of a review) doesn't depend on the category, we can exploit the fact that the probability of a beer category given a review is proportional to the probability of a review given a category. In this way we're able to make a text classifier that takes into account word order.

Regarding Google Analytics, we just noticed this ourselves and are fixing it presently. Concerning color, that's a great point and I apologize for the oversight. We certainly bear no malice towards those with protanopia. In future work (and if we have subsequent versions of this paper) we'll do our best to make the lines dotted, dashed, etc to make them more readable to those unable to distinguish reg-green. Generally, I agree with you that a black and white printing should retain all readability.

Thanks for your interest and helpful observations.

Cheers,

Zack

smackfu · on Nov 23, 2015

Two of the categories are interesting choices:

Fruit / Vegetable beer: this is kind of a catch-all category, so I wouldn't be surprised if the data was a bit of a mess. Like you can have pumpkin beers mixed in with strawberry beers.

American Adjunct Lager: this is the category used for American junk beer like Bud Light. I bet most of the reviews are very low rated.

zackchase · on Nov 23, 2015

Hi smackfu. So one wonderful virtue of this technology is that it can handle messes quite gracefully. Part of the fun is to see that it can learn to stay on topic - to decide it's talking about a raspberry beer and continue to talk about rapberries. As far as fruit vegetable beer being a catch-all category, that's only within fruit beers. They're certainly distinct from IPAs, stouts, porters and lagers. Interstingly, for classification, far the hardest categories to disambiguate are Porters and Stouts (probably because stouts really are "stout porters").

Our goal was to pick a few categories would allow us to evaluate the capabilities of the model. Thus having some that are crystal clear and others heterogeneous was appropriate. Regarding American Adjunct Lagers being low rated, yes the lagers are frequently described as "piss", "watery", and "urine".

larsga · on Nov 23, 2015

Most of the ratings are absurd and self-contradictory, but occasionally you get one that sounds right. I've downloaded the paper, but haven't read it yet. To me the reviews looks pretty much like what you'd get if you built a Markov chain from the BeerAdvocate database. Is this really better in any way?

zackchase · on Nov 23, 2015

Hi larsga. Thanks for your interest. To begin, yes the concatenated input RNN is much better. A Markov chain could not do this feat (or make comprehensible text at the character level generally). Occasionally the reviews contradict themselves, but they're actually remarkably consistent regarding the conditioned upon attributes. I'd suggest you pay attention to the "temperature". This is a parameter that determines how stochastic the generation is. With low temperatures the reviews are less varied but "make more sense". With higher temperature they are more entropic.

When you tell it to make an IPA review it stays on topic and talks about hoppy flavor. About stouts is consistently calls them black, with hints of chocolate (not to mention using the word "stout").

Here's an example of a review I just generated for "Fruit/Vegetable beer":

"This brew pours a very clear golden color. The finger head is pretty small and fizzy and has a slightly pink color. The smell is really nice. The taste is fruity and sweet, but not overwhelming. The flavor is a little weak and is a bit sweeter than most beers but still very nice. I could drink this all day, but I would probably prefer the fruit beer to be a bit more pronounced. This beer is actually quite smooth and inviting. It has a strong taste of raspberries but is complimented by a nice tartness that comes through as well. The mouthfeel is smooth and creamy with a dry finish. This is a very drinkable beer and I could see myself enjoying to try this one again."

Clearly the RNN learns to form words like "fruity" and "sweet" and "raspberries" to describe a fruit beer. It also says the flavor is a little "weak" and in the next sentence says it would prefer for the taste to be "more pronounced". Keep in mind, this neural network was given no a priori notion of words. A Markov chain cannot produce even remotely similar conditional text at the character level.

For proof that it learns to differentiate the different types of beer, we demonstrate in the paper that the model can be run as a classifier and classify the type of beer form the review with 90% accuracy (on previously unseen test data). This is almost comparable to state of the art logistic regression tf-idf ngram model, despite the fact that we haven't even tuned the model especially carefully to be a classifier (with regularization or hyper-parameter search , for instance.

Here's another example (for an IPA):

"This is a fine IPA for sure, but not a beer I would love to drink a lot of. This is one of the better IPA's I have ever had. I can see why the beer is unlike any IPA should be and the best beer I've ever had. I could drink this all night, but it is very drinkable. I could easily drink a few of these without a problem. I don't know what the malt base but it is so faint and it is pretty tasty. I can see why the beer is a great hop bomb and the flavors are both subtle and superb. It's a great balance and can be a good supply of the style. I like it, but the hops are a bit off the more I drink them, but the hops are very pronounced. The finish is a little bitter, and the hop flavors are great."

I'm not sure what else to say, if you don't believe that the net has learned to distinguish an IPA. Of course, it does contradict itself on sentiment. But with lower temperatures even this is not so common. It can also be addressed by setting extreme star ratings. (we can actually put in a 0 star rating, or as high as a 10 star rating to induce a review of more extreme sentiment).

A comedic point that must be made here is that the source material (the reviews from BeerAdvocate) are themselves absurd, and occasionally contradictory. They English they contain is ungrammatical, ridiculous, and frequently misspelled. Nevertheless it's a fascinating dataset on account of how well-annotated and dense it is (the 190-core has over 250k reviews).

larsga · on Nov 23, 2015

The part I missed is that you're doing this at the character level, and not at the word level. If you were doing this at the word level a Markov chain could easily tell an IPA from a porter. But at the character level it suddenly becomes a lot more impressive. Thank you! I'll read the paper now.