Hacker News new | past | comments | ask | show | jobs | submit login

I once worked at a major big box retailer where somebody came up with a visualization that purported to show, for a given product category, purchases made in other categories. One surprising purchase correlation was customers bought TV stands after buying DVD players. So, this nugget was trumpeted at countless meetings about the value of big data analytics. Multiple marketing campaigns were designed around this discovery.

Of course, that made no sense, so I checked a little deeper. You know what else people also buy when they buy DVD players? TV's. The DVD/furniture relationship was an artifact of the high degree of correlation between TV's and DVD players, which the visualization tool failed to account for.

I brought this up immediately, but received tepid response. Of course, months later, I was still hearing about DVD players and furniture. It had become part of the institutional lore, and no facts were going to replace that.




> One surprising purchase correlation was customers bought TV stands after buying DVD players. So, this nugget was trumpeted at countless meetings about the value of big data analytics. Multiple marketing campaigns were designed around this discovery. > Of course, that made no sense, so I checked a little deeper.

And there was me thinking it made sense, because I'd done the same thing. A TV can stand on many surfaces, but once you've got a DVD player (some thin, wide rectangle) it makes a lot more sense to get a TV cabinet to put the DVD player in.

Perhaps not so much sense now, but 8-9 years ago I went through this logic.


This is absolutely hilarious. One of the things I talk about in my presentations is that the hardest part of "data analytics" and especially "advanced visualization" is that it's hard to know when you're sucking. It's actually pretty easy to come up with some basic interesting things, but then what do you benchmark against? If you don't do the hard work to evaluate significance, you can start convincing yourself that you're more insightful than you actually are.

What the business people don't appreciate is that the ML models don't know they're looking at DVD players and TV stands. They just know that vector elements 27 and 291 have the strongest correlation. It takes a human in the loop to say, "item 291 is technically just TV Stands, but we've done dimensional reduction to pool TVs, TV Stands, and Projectors all into cluster 14, which then correlates to cluster 12, consisting of DVD players and Xboxes".


Whenever you say "that made no sense", I think that you are using too much bias and not giving enough credit to what the data is telling you.

If you look at the most "controversial" data science paper from 2013 where a study correlated intelligence to Liking the Facebook pages "Curly Fries" and "Thunderstorms" (here is a summary: http://www.wired.com/2013/03/facebook-like-research/), there were a lot of proponents saying that there was no causation, and the correlation was not founded, etc.

Of course, you would say the study "makes no sense". Intelligence can't be predicted by Facebook Likes. There is no correlation there, etc. But why not? If you read the paper (http://www.pnas.org/content/110/15/5802.full.pdf) their logic is sound. Is the marketing campaigns that the company bought based on the TV Stand<>DVD Player connection any different than other marketing campaigns? Facebook does all of their ad display based on similar data analysis as above, and it seems to be working for them.

Note: There is the not-so-hidden machine learning feedback loop now (explained better here: http://www.john-foreman.com/blog/the-perilous-world-of-machi...), where people Like the 'Curly Fries' and 'Thunderstorms' pages because of the research.


Whenever you say "that made no sense", I think that you are using too much bias and not giving enough credit to what the data is telling you.

What? If a data scientist sees something seems illogical, there is no reason not to investigate it and see if he/she can find a more logical explanation. Sure, if the effect seems real but unexplained, you can accept and use it but advocating a kind of big data mysticism, "don't investigate, accept" seems to be buying into the senseless hype. And if you read the post, you'll notice the parent actually discovered the association was just an artifact of an easily explained association.

And, no, there's no much reason for companies to advertise just a TV stand and DVD player. Common sense tells one what the data actually data, that those two items, by themselves aren't and weren't what many people were just dreaming about.


How are association rules "big data analytics"?

The article is very refreshing and I bookmarked the site. What I am more frustrated with is that a lot of people use this stupid term "big daata" for things which do not fit the description. If it's structured, it's not big data. If it comes at 2MB/s it's not big data. If it fucking fits in your RAM, it most certainly is not big data.


What on earth are you talking about ?

(a) Association rules are big data when you are doing them on large data sets with many variables. I work at a company that sells tens of thousands of different products and tens of millions of customers. Definitely takes us a while to compute those rules.

(b) The majority of big data is structured. For most big data projects it is typically stored in old school Oracle/Teradata/etc data warehouses and shipped into a Hadoop cluster. It may not be consolidated but it is definitely structured.

(c) The total RAM of our Hadoop cluster is 4TB and ours is small. I would consider that to be big data in the sense that it overwhelms any applications that directly try to access the raw data.


You can stick 6TB in a single 4U Proliant from HP: http://www8.hp.com/us/en/products/servers/proliant-servers.h...

If you need a few PBs of spindle storage, hook that server up to a DDN or Panasas rack.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: