

I.B.M. Unveils Real-Time Software to Find Trends in Vast Data Sets - brandonkm
http://www.nytimes.com/2009/05/21/technology/business-computing/21stream.html?_r=1&ref=technology

======
smokinn
People seem to be confused about what IBM's System S actually is. For a more
technical look at their new offering I suggest these two posts:
[http://www.dbms2.com/2009/05/13/ibm-system-s-infosphere-
stre...](http://www.dbms2.com/2009/05/13/ibm-system-s-infosphere-streams-
processing/) [http://www.dbms2.com/2009/05/18/followup-on-ibm-system-
sinfo...](http://www.dbms2.com/2009/05/18/followup-on-ibm-system-sinfosphere-
streams/)

Quick summary: It's for doing complex event processing of large streams of
data. Think financial data or any other such sort of never ending flow of
massive amounts of data that you would need to process very quickly to
identify patterns and for other analysis such as determining when you hit
certain thresholds or trends.

It's supposed to be very flexible so that it can act on a large amount of
unstructured data as well as very scalable. It defines its own processing
language too.

~~~
tom_b
There seem to be echoes of what I've seen in kdb+
(<http://kx.com/Products/kdb+.php>) in this offering from IBM. I know that
kdb+ is an entrenched part of the software developed internally at financial
companies to deal with huge streaming datasets

Any ideas how they compare? System S and kdb+ sound very similar, both
offering a solution to development centered around streaming datasets with
extremely low latencies, both with their own programming language.

------
jnorthrop
This sounds a bit too good to be true, and an awfully good way for IBM to sell
it's consulting services. How can a system be capable of analyzing financial
data, weather system, medical information, etc. without someone plugging in
all of the data sources and explicitly telling the system how to correlate the
data -- of course to do that you simply pay for a large helping of
consulting... cha-ching!

Maybe I'm missing something fundamental (the article was vague) but I just
don't see anything really new here. I read it as marketing.

~~~
wmf
To be clear, System S is a framework and programming language for building
streaming apps; it doesn't actually know anything about finance, weather, etc.

For the technical research underlying it, see
[http://domino.research.ibm.com/comm/research_projects.nsf/pa...](http://domino.research.ibm.com/comm/research_projects.nsf/pages/esps.index.html)

------
dkarl
What does "real-time" mean? Oh yeah, it means "faster than our competitors."
And "stream" means "the Cell marketing team encouraged us to use the word
'stream' a lot."

Factor out those two buzzwords, and you have an article about IBM releasing
some software that analyzes data really fast and helps you make decisions
based on that data.

If newspapers are in such dire financial straits they really need to start
charging for stuff like this.

~~~
wmf
I know I shouldn't, but...

 _What does "real-time" mean?_

It means that data is processed at the same rate that it is produced. (Or as
Wikipedia says, "originally it referred to a simulation that proceeded at a
rate that matched that of the real process it was simulating.") A side effect
of real-time is that RAM can be used for buffering rather than using disk for
storage.

 _And "stream" means "the Cell marketing team encouraged us to use the word
'stream' a lot."_

Wikipedia defines a stream as "a succession of data elements made available
over time" and stream processing as "the quasi-continuous flow of data which
is processed in a dataflow programming language as soon as the program state
meets the starting condition of the stream". I don't think System S even runs
on Cell.

While "real-time" and "stream" are too frequently used as buzzwords these
days, IBM is actually using them correctly in this case.

~~~
dkarl
"Real-time" is relative to the time scale on which you look at it. Nobody
dealing with a daily flow of business data actually falls behind and ends up
making decisions on older and older data because processing is backed up.
"Real-time" just means "faster;" it's relative to human expectations. If
seismic data gathered this morning were processed and available for inspection
late this afternoon, that would be "real-time." If an MRI was performed this
morning, and the image wasn't available until this afternoon, that would not
be "real-time." It's a mooshy term -- it has meaning in a specific domain,
relative to human expectations in that domain. In the article, outside any
specific domain, it doesn't mean anything.

 _"a succession of data elements made available over time" and stream
processing as "the quasi-continuous flow of data which is processed in..._

Basically automated batch processing where the "chunkiness" lies beneath a
certain threshold of perception.

 _While "real-time" and "stream" are too frequently used as buzzwords these
days, IBM is actually using them correctly in this case._

How can you tell? If you have a better link, with real information, post it!

~~~
scott_s
<http://portal.acm.org/citation.cfm?id=1376616.1376729>

<http://portal.acm.org/citation.cfm?id=1289612.1289615>

~~~
dkarl
This looks interesting. Now if somebody could explain how I was supposed to
realize there was something interesting behind that crappy New York Times
article (short of already knowing?) My original post got modded into oblivion,
but I stand by it. The article looks like press material provided by IBM and
lightly worked over by the Times writer whose name appears on it.

~~~
wmf
I know where you're coming from; I love writing cynical "buzzword alarm"
comments. This time you got unlucky and criticized a legitimate project; I
wouldn't worry too much about it.

~~~
dkarl
Just because marketing drivel happens to be true doesn't make it right to post
it on Hacker News. A stopped clock is right twice a day. I don't have anything
against the project, just the fact that the marketroids got their copy
published in the New York Times and then linked on Hacker News.

~~~
wmf
Yeah, I wish people would submit the technical papers instead.

