

Big Data's invisible open source community - abennett
http://www.itworld.com/big-datahadoop/254898/wild-west-big-data

======
NyxWulf
I'm at the airport heading home from Strata right now. I have to say the
conference was very disappointing. There were far too many commercially
"sponsored" sessions. There seemed to be no quality or content guidelines for
the talks and most of them were either sales pitches, vague and fluffy but
full of hype and buzzwords, or they were entry level.

This was my first time at Strata, but compared to other technical conferences,
I walked away thinking it was a serious waste of time.

It's possible that's just a sign of the gold rushy nature of Big Data right
now, but as a serious practitioner, I can't recommend this conference.

\-- Edit I will add that it's possible my selection turned out to be very
poor, and there were good sessions I didn't make it to. I really hope that was
the case actually. If anyone was there, did you find any sessions particularly
valuable? I would be interested in input from people who are doing Big Data
now, not those thinking about it.

~~~
jgrahamc
I think this is valid criticism based on presentations I've read from past
conferences. I've been trying to find out concrete information about building
a petascale data store to contain log file data for ad hoc analytics and it's
extremely difficult to get really good information.

~~~
mhansen
You've probably already seen this, but for the sake of others, Google recently
released the research paper detailing their petabyte-scale ad-hoc logs
analysis platform, "Dremel".

<http://research.google.com/pubs/pub36632.html>

------
joshklein
Contributors to several open source data science libraries for Python will be
gathering to discuss how to do a better job of coordinating the community at
PyCon next week. If you're interested, please leave your name on the open
space and check back for a final time & location:
<https://us.pycon.org/2012/community/openspaces/pydata/>

There's a seed of an idea for "PyData" we hope to flesh out and decide if it
merits pursuing.

------
lacunainc
The Expo at the conference was a little disappointing as well. Many companies
claim themselves provide "Big Data Analytics solutions" but deliver products
that are little more than wrappers of Hadoop MR or Hive.

------
peteforde
I believe that the data scene is nascent but growing — both in the open source
world and inside of organizations. However, I am not convinced that the Strata
attendees are attacking the real problems that organizations who work with
data already had... long before Hadoop was a gleam in the eye.

Specifically, working with and sharing data (whether publicly or in a team) is
stuck in the 1990's. Every time someone emails an Excel spreadsheet, a kitten
dies! The obvious comparison is people emailing video files in the pre-YouTube
era... if someone emailed you a 10mb dancing baby AVI, you'd think that their
computer had been hacked.

So why do we put up with this sad state of affairs? Where is the GitHub of
datasets to rescue us and let a real data community take hold?

That's why we built BuzzData:

<http://vimeo.com/35262021>

The next time you read an article about where the money in "Big Data" is going
to come from, stop to question why people are looking for new problems when we
haven't solved the ones we already have?

[http://blog.buzzdata.com/post/18555391280/data-even-if-
its-n...](http://blog.buzzdata.com/post/18555391280/data-even-if-its-not-big-
it-should-be-clever)

~~~
Drbble
This content-free shilling is exactly what people disliked about Strata.

