

Beer and Data Science – Looking at topic modeling in multi-aspect reviews - bcohen123
http://nbviewer.ipython.org/gist/benjamincohen1/d7caaa3d07bbb89cd39a

======
bcaine
Nice read. I did something sort of similar with the same dataset about a year
ago. I compared LDA (Latent Dirichlet Allocation) to TF-IDF as tools to find
similar beers based on their review text. Lots of intuitive and funny topics
discovered.

I suggest you play with LDA, it seemed to work really well at generating
topics. There is also a lot of fascinating, very readable research using it.
Check out SNAPs work on the same dataset [1] and some of the Yelp Dataset
challenge winners [2]. If you end up interested in doing so, Gensim [3] was
pleasant enough to work with.

[1] [http://snap.stanford.edu/data/web-
BeerAdvocate.html](http://snap.stanford.edu/data/web-BeerAdvocate.html)

[2]
[http://www.yelp.com/dataset_challenge](http://www.yelp.com/dataset_challenge)

[3] [https://radimrehurek.com/gensim/wiki.html#latent-
dirichlet-a...](https://radimrehurek.com/gensim/wiki.html#latent-dirichlet-
allocation)

------
gjreda
Great post! I've been thinking about writing something similar with that same
BeerAdvocate data. Good job beating me to it :)

Instead, I ended up writing a satirical beer snob bot [1] which tweets
nonsensical beer reviews using Markov Chains. Some are bad, but some are pure
gold. You can read about it here [2]. The code's also on GitHub [3].

[1] [https://twitter.com/BeerSnobSays](https://twitter.com/BeerSnobSays)

[2] [http://www.gregreda.com/2015/03/30/beer-review-markov-
chains...](http://www.gregreda.com/2015/03/30/beer-review-markov-chains/)

[3] [https://github.com/gjreda/beer-snob-says](https://github.com/gjreda/beer-
snob-says)

~~~
bcohen123
Cool stuff, followed! Feel free to steal any parts of my work you think may
improve it. May be cool to be able to control the polarity of the review
you're tweeting.

------
JasonCEC
For anyone interested in beer and data science, my startup[1] uses machine
learning and artificial intelligence to build flavor profiling and quality
control tools for craft beverage producers.

Our models flag and predict flaws, taints, contaminations, and batch-to-batch
deviations in real time from human sensory data. We then leverage our clients
quality control data for flavor profile optimization, demographic targeting,
and cognitive marketing - helping them sell consistently better products to
their most valuable consumers.

[1] www.Gastrograph.com

~~~
bcohen123
Cool stuff! And nice last name!

------
socceroos
I just came across a relevant site this morning. Hilarious hipster brew review
satire: [http://vicioustasting.com/](http://vicioustasting.com/)

~~~
bcohen123
Wow, that's pretty well done. Any indication how it's made? Looks too good to
be an HMM.

------
archimedespi
Love the license =D

~~~
igravious
heh, logged in to post it :)

\-- spoiler alert --

[...]

If the Author of the Software (the "Author") needs a place to crash and you
have a sofa available, you should maybe give the Author a break and let him
sleep on your couch.

If you are caught in a dire situation wherein you only have enough time to
save one person out of a group, and the Author is a member of that group, you
must save the Author.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO BLAH BLAH BLAH ISN'T IT FUNNY HOW UPPER-
CASE MAKES IT SOUND LIKE THE LICENSE IS ANGRY AND SHOUTING AT YOU.

\--

------
cobranet
How to get data ?

~~~
bcohen123
I grabbed the data from here awile back: [https://snap.stanford.edu/data/web-
BeerAdvocate.html](https://snap.stanford.edu/data/web-BeerAdvocate.html)
Unfortunately it appears to no longer be available for download. This
([https://snap.stanford.edu/data/web-
RateBeer.html](https://snap.stanford.edu/data/web-RateBeer.html)) seems to be
a similar dataset which may be available for use.

