

Data-Visualization Firm’s New Software Autonomously Finds Abstract Connections - edmaroferreira
http://www.wired.com/design/2013/01/data-viz-ayasdi-iris/

======
saosebastiao
Automatically finding connections in your data is easy. The problem is finding
connections that are 1) non-obvious and 2) actionable. And when they are
found, filtered so that they stand out from all the obvious, non-actionable
connections.

In other words, if I'm trying to find out why a plane crashed, and the first
thing this system tells me is gravity, then it isn't a practical advancement.

------
danialtz
So far what I see is an "innovative" service/tool with a non-innovative old
(e.g. biotech) software business plan. I have to jump through hoops to get my
hands on this tool. Have a phone call with them, talk about your data etc.
Then, what is the point of easy UI and no-human-needed touch? Not to mention I
have no idea about their price model.

I don't understand what is their cost in letting people try it before using
the tool; unless of course it runs on their machines. Put it somewhere and let
me try it with some simple data (e.g. iris in R).

------
sonabinu
Looks very exciting, but what does it do that R does not do? For example a lot
of the graphics in nytimes.com done using R. I assume that they use massive
data sets. If a platform like Hadoop is used to get the speed, won't R get the
same results? How is Ayasdi different?

------
darkxanthos
"...has developed data visualization software it says uses big data to answer
the questions you never thought to ask..."

Replace "questions" with "hypotheses" and "thought" to "cared".

I'm all for having a tool that can highlight a ton of interesting things in my
data... I'd make a business case to buy that any day. But finding all the
connections is so noisy compared to finding the __right __connections. That's
the cynic in me.

The optimist in me is looking forward to when these kinds of techniques are
just another R package I can load and turn loose on my data. :)

~~~
disgruntledphd2
You could probably already do that with the R package caret.

It can train models for you automatically, so you could loop over every model
available and return the top ten predictions. Of course it wouldn't get around
the need for feature engineering, but its theoretically possible.

Note - do not attempt this unless you have lots and lots of machines, as it
will take a very, very long time.

------
danso
> _The power of Ayasdi is its unique ability to automatically discover
> insights — regardless of complexity — without asking questions. Ayasdi’s
> customers can finally learn the answers to questions that they didn’t know
> to ask in the first place. Simply stated, Ayasdi is ‘digital serendipity’.”_

 _It’s a bold statement, however by using algebraic topology Ayasdi has
managed to totally remove the human element that goes into data mining — and,
as such, all the human bias that goes with it._

\---

Don't companies with massive datasets already do this in some way? Google's
system for piecing together what people "really want" seems such a large,
open-ended inquiry that it seems _some_ automated insight discovery must be
done.

Also, wouldn't a machine that generates automated insights require another
layer of meta-analysis to be able to sift through which insights are actually
usable? Using the example of sports statistics, you could generate an infinite
number of trivial relations between players and plays and teams...it seems
that at some point, a human with real domain knowledge has to go in and
program a filter system, which seems to be about the same amount of work as
the inquiry-generation that this software automates.

Finally, why all the emphasis on visualization? Visualization helps to
illustrate to humans the possibilities of investigation...for a machine that
can supposedly ask (and answer) all the correct questions, isn't visualization
merely eye candy?

It'd be great to see a concrete example of this software in action. Perhaps
feed it the NFL play-by-play data that was posted on HN a few weeks ago and
see if it can generate usable strategy.

~~~
mjn
It's a bit of an odd article, but my guess is that it's a gloss of a press
release, and that kind of writing is par for the course for AI/ML press
releases.

Gunnar Carlsson, the researcher mentioned, seems to work mainly on an approach
related to manifold learning (roughly, finding lower-dimensional structure in
high-dimensional data) that's based on algebraic topology. He co-ran a
workshop on that last year at NIPS, one of the main machine-learning
conferences: <https://sites.google.com/site/nips2012topology/> . He's written
some highly cited papers on the subject, though it'd be a bit of a stretch to
claim he invented the area, since there have been workshops at least back to
2007, that one organized by a set of French researchers:
<http://topolearnnips2007.insa-rouen.fr/description.html>

I would guess the part about removing the human element from data mining is
putting an optimistic PR spin on the basic idea of automatically extracting
lower-dimensional structure, which, if it works, should allow for less feature
engineering. The emphasis on visualization makes sense in that light, if
they're planning to sell it as a no-expertise-necessary system: feed it data
and get interpretable-by-non-experts viz as output, with any complexity that
would normally require "data scientists" being handled automatically.

------
sakai
How does this differ from Quid's offering? (<http://www.quid.com>)

------
mikhailfranco
Shame about the name. 'Iris' was the name for a range of SGI's 3D workstations
and their scientific visualization application 'IRIS Explorer' (an AVS clone),
that was later sold to NAG Ltd:

<http://www.nag.com/Welcome_iec.asp>

Mik

------
skilesare
If you visualize the data properly, you rarely need to automatically discover.
The answers literally jump out at you. We've been doing this with AlphaVision
for years:

<https://aqumin.fogbugz.com/default.asp?W45>

