

The big data disaster  - sirteno
http://thejonathanmacdonald.blogspot.co.uk/2012/11/the-big-data-disaster.html

======
smoyer
"Big Data" has an identity crisis ... everyone you ask has "Big Data" but I've
heard it used to describe 200K (yes thousand) rows of data. I've also seen
terabytes of data successfully stored in an SQL database. I think "Big Data"
needs a divorce.

We've always had data that defied structure, but now (with NoSQL, map-reduce,
etc) we have a better idea of how to deal with it. We've still got data
processing tasks that can easily be parallelized and others that are
inherently linear.

Amazingly, when you actually have "Big Data", you need to analyze how best you
could process it and architect your systems and software based on the facets
listed above. In a corporate BI environment, you should know what questions
you're trying to answer before you start. Unfortunately, in a research
environment you don't always know what you're looking for.

We're trying to bring research scientists and computer scientists to optimize
how data is analyzed at the Penn State University ... check out
@HackingScience on Twitter.

~~~
VonGuard
Yeah, you're absolutely right. For a lot of people, big data means Hadoop. For
others, it means NoSQL. What I really think it means is just having all of
your data in one live, accessible and analytics-ready place. That means logs,
customer info, geographic info, demographic data, weather patterns, and just
about anything you could ever think of.

I like to think that big data is more about aggregating the information
available in multiple datasets, and then figuring out how you can read those
entrails, as it were.

Honestly, big data today is mostly about a potential replacement for data
warehouses. The real revolution around big data is that we now have a place to
put it for analysis. That was a big hole that Hadoop filled. Now, the next 10
years will be spent actually figuring out how to analyze these big pools of
data effectively. Frankly, that's the harder task, I think.

We now have the ability to ask questions of ALL the data. Too bad we have no
clue what the questions should be.

------
amalag
I do think the big data promise is overblown. Looking at my local grocery
store, the american one is always trying to collect my phone number and tie it
to my purchases. That will give them an edge more than what they have to pay
their multi-million dollar IT budget to satisfy this? The store is barely
filled during peak times.

Meanwhile, the asian grocery store down the road has old credit card terminals
and no phone tracking or other gimmicky crap. They sell cheap vegetables and
are PACKED and are huge and they rent out their space to other vendors,
Walmart style. No place to move in peak shopping time. This is because of
their greater selection and vastly lower prices.

There are other factors, but I don't think keeping track of what everyone buys
will give them some insight to raising prices on a few items.

The IT edge was realized by companies like Walmart and Costco in their supply
side and inventory, I don't think big data will give them a similar edge which
is what they are aiming for. Like Target gave targeted advertisement to
pregnant women, will that edge really vastly offset their number crunching IT
costs?

~~~
mattmanser
Being packed or not has no relation to profit.

If they can increase their revenue by 10%, and it cost them 5% of their
revenue to do, then it's worth it.

You're missing what they'd do with that data.

Keeping track of what you buy means they can email/text/whatever to say 'hey,
this %reallyinterstingthing% is on offer'. And really interesting thing really
is interesting to you.

It's like an article I read a while back about how supermarkets can predict if
you're pregnant by the things you're buying before you've even told anyone.

That's valuable to them, although extremely creepy. They can start bombarding
you with vitamin supplement adverts or books or whatever and you're more
likely to buy as you actually want those things.

~~~
ams6110
I get those texts, emails, and mailings. They all go right in the trash. Why
should you discount something someone is predisposed to buy anyway? You should
raise the price instead.

~~~
nostrademons
The point of discounts, coupons, etc. isn't to get you to buy, it's to build
very comprehensive data on the demand elasticity curves of each good they
sell.

If you've taken an introductory microeconomics course, you'll remember that
almost all the models of a "rational" firm involve maximizing profit by
picking a price point where people still buy yet will pay a maximum price per
unit. How firms determine that demand curve is left as an exercise - it's
assumed that firms who get it right will survive while firms that don't will
go out of business, leaving only firms who get it right. This is cold comfort
if you're a business owner and want _your_ firm to get it right.

So what do the big retailers do? They A/B test. They divide customers into two
groups (say, those who have a Safeway card and those who don't) and offer a
different price to each. Then they measure how much of each good people will
buy at each price, and presumably do some statistical corrections for
demographics, the "sale" effect (where people buy whatever's on sale
regardless of price), folks who won't use coupons no matter how cheap, etc.
They end the sale, and then pick a different price next month. Over time, they
build up extensive data about just how high they can raise their prices before
people stop buying entirely.

So when you ignore the discount, that's Mission Fucking Accomplished for the
store. It's telling them that the discount doesn't matter, and you'll buy
regardless, and so they should try raising the price instead. They don't care
about the _sale_ , they care about the _data_ , so that they can make more
money from all their other customers.

~~~
dreamdu5t
Honest, you just kind of blew my mind. I find it hard to believe they actually
do that though?

~~~
nostrademons
I've never worked at a brick & mortar retailer, so I can't know for sure. I
know _for sure_ that online retailers do this - a few have been caught showing
different prices to different customers, or to the same customer in different
browsers. I'd be _very_ surprised if Walmart/Target/Safeway/CVS/etc. don't -
what else is the point of the "rewards card" if not to gather detailed data?
Why do they seemingly have a different "everyday low price" every month?

~~~
rohamg
While I think you're right re:data (got my up vote for that!), Safeway club
card is a bad example- 1) it's basket-wide and 2) almost everyone uses it. SW
CC is good old fashioned price discrimination, which is designed to capture
producer surplus from customers whose willingness to pay is greater than the
market price. Many other examples incl. airline tickets: the extra charge for
first or for a flexible ticket isn't because it costs that much to keep your
spot flexible in their model, it's simply to price discriminate and be able to
charge as high as the market will pay. Nothing wrong with it, but the key
difference is that the sale itself does matter.

------
blauwbilgorgel
Social was a hot topic. Mobile and tablets are a hot topic and big data will
become a hot topic.

Did many dive head first into the social and mobile bandwagon? Sure. But to
say there is no bandwagon or profit to be made in these niches is not
realistic.

Everyone knows there is a hype, but that hype is based on reality. Big
consumer data already has a value per GB.

Some say Big Data is just Business Intelligence over a large amount of data. I
don't disagree and I see a solid future for BI.

Managers and consultants should prepare for the big data storm that is to
come. Learn about the possibilities and impossibilities of cloud computing and
Hadoop. I believe in a few years even tech-savy consumers (who now own an
iPad) are able to use Big Data on their own data. Anyone can already rent a
few Amazon resources to compute, and work with technology invented at Google a
few years ago (MapReduce, BigTable). Or if you don't want to reinvent the
wheel, go for Big Data as a Service, like Cloudera or Splunk.

In exchange for discounts, consumers will hand over their data to companies
for free. Coordinates or purchasing behaviour: everything gets stored.
Companies are busy exporting old data from tapes, so they can run completer
aggregate queries.

When more big data analysts emerge, some companies will be ready: They'll have
datawarehouses filled with data. If you do not invest in big data, you'll soon
lose your competitive edge in regards to BI and consumer data.

The people that are extolling the virtues of big data are usually external
consultants. They were the same that told companies to focus on mobile apps,
and before that, to create a social presence. You can rail against that, but
that is merely an epiphenomenon of every hype: Not what is really about.

All big tech companies are preparing for the future of big data (Amazon,
Akamai, IBM, Microsoft). Facebook has the largest known Hadoop cluster in the
world (over 100PB). Google might process around 25PB of data each day...

------
goodside
"Whether you agree with my thesis or not, please keep my predictions in mind
as you read through the papers. Look at the pages of the business news today
(whatever day it is you read this). Which company is showing one of the five
predictions to be true? I bet you there is one."

For any five predictions that are even remotely possible, any business
newspaper from any day in history will contain at least one example of a
company that meets one of those predictions.

~~~
Evbn
Dunno, I need a Big Data setup to crunch stats over all the newspaper
articles.

------
thom
Eventually we'll have enough data on big data startups that we'll be able to
algorithmically create better big data startups, so the problem really fixes
itself.

------
PaulHoule
huh?

for me "big data" is the emergence of scalable tools for handling data sets
that are much bigger than you can handle with one computer. (ex. Hadoop)

there's definitely a feeding frenzy here because the spend on hardware could
get high (say 200 powerful servers), therefore the spend on associated
software and services could be high too.

a few issues stalk big data

for any IT project there's the issue of how much value you can create per bit.
Imagine Facebook has a $10 ARPU, for instance. They can't afford to use more
than $10 a year worth of hardware and electricity to offer you the service.

Thus, if your "big data" project is going to cost $500,000 you ought to have a
plan to create more than that in value.

often the best way to deal with "big" data is to make it small. I had my eyes
on a 50 TB data set a few months ago but didn't have the budget to work with
the whole thing.

I realized I could take a statistical sample of just 50 GB which I could
download to my house and process on my cluster here. I can't get an accurate
answer to ~every~ possible question with my 50GB sample, but I got answers to
the most pressing questions and discovered that a 1 TB sample would be good
enough for the toughest questions I wanted to ask.

~~~
Anon84
I'm curious what those 50TB were of. I don't know of many datasets of that
size.

------
pav3l
"Big data" technologies just help you perform data analysis on larger
datasets. It can be anything from calculating some average to building
advanced statistical models. If you are claiming it's useless then you're
either claiming 1) that larger datasets don't add any value or 2) that data
analysis in general is useless. It is not clear to me from the article which
one the author claims. Either way, it reads like some unsubstantiated rant
about a perceived buzzword.

------
mcphilip
TLDR: Mining big data collected on users won't boost the ability of companies
to sell products to the degree big data proponents claim.

This article is garbage since it totally discounts all the other uses of big
data. I, for one, work in the risk analytics sector where 'big data' is mined
to try and identify which assets have the biggest impact in a risk profile.

~~~
amalag
Risk analytics has always been an area for data crunching. It is not new and
will benefit from big data advances. I think the argument is how far do you
get until you have financially limiting diminishing returns.

------
paulgb
Big data means so many things to so many different people, I can't decipher
what positive statements are being made here.

~~~
ihsw
It can easily be construed as broad-sweeping and baseless collection of user
data, specifically in the hopes of somehow monetizing it.

It is baseless because it is passive and without direction.

I'm not bashing it, as a matter of fact it excites me as a programmer.

~~~
paulgb
Thanks, the article makes more sense when I read it with that definition of
big data.

------
1337biz
I, for one, completely reject the author's claims. He must be wrong just by
the simple fact that he is not citing enough data to back up his predictions.
Definitely needs a better forecasting model.

------
justin_hancock
This is a just discussion of the hype cycle, you can apply the very general
rules to almost any technology. No new insight here, it feels like link bait.

------
jfb
This is gibberish.

------
Robin_Message
Since no-one else has mentioned it, surely the prediction is number 4
(allegations of fraud after the acquisition of a big-data company) and the
company in question was purchased by a well-known printer manufacturer with
the same name as a popular table sauce?

Edit to add: On re-reading the article I see that I misunderstood a general
statement for a rhetorical denouement. Still, the example I gave is a strong
one, although not a prediction since it already happened (indeed, it may have
prompted this very blog post.)

------
trimbo
Here's my big data trap motto: "a 1% lift of zero is zero".

What I mean by that is you can optimize one or several values to death, but if
the core product is broken, you'll still lose all of your customers. So while
you'll get a 1% lift over control for making the button blue over green, what
you failed to notice is that the product sucks.

Big data (née "Business Intelligence"?) has its place, but beware of it
becoming your entire product strategy.

------
fnl
Very, very little actual facts - but maybe I missed something?

------
rco8786
It says the decision is binary...but after reading the article I have no idea
what is the 1 and what is the 0?

Not very clear on what he's talking about.

------
kordless
> Abracadabra. Magic isn't it?

This post should be called "The No Data Blog Disaster" for assuming we are
mind readers.

------
bernatfp
It's funny how all the business media (e.g. Harvard Business review last
month, the latest MIT Sloan issue...) bring the hype with Big Data and Hadoop
as some sort of magical software that will solve all their problems, better,
faster and cheaper, when instead, in most cases it is an overkill solution.

------
johnrgrace
I always define big data as, too big for excel spreadsheets.

~~~
nivertech
There is MPI add-on for Excel, which allow to work with huge spreadsheets on
HPC clusters, developed originally for some Swiss Investment bank. So Excel
does scale.

------
nickbarone
Seems to me that the problem you're describing is one of confusing the tool
for the task, rather than the tool being over-hyped.

------
angel_007
luks like its all full of big claims without substantial reasoning :S ... or
mebbe i dint read it right :(

------
d--b
what is big data ?

~~~
elchief
it's when you need at least 2 computers to make a chart

