

Ask HN: Good tool to visualize data? - hyuen

Hi all, I am in the need to visualize some time series data. The problem I am having is that I want to be able to impose restrictions in an ad-hoc way, such as delimiting time, filtering out some categories, and then get some statistics on that data, such as the mean, a histogram, or other things.<p>What I am doing right now is a bunch of gnuplot scripts together with python/c programs to grab the data, but this is getting tedious.<p>Is there any open source tool/library that does something similar?<p>Thanks
======
jjguy
The suggestions below are solid:

    
    
      R 
      Python + matplotlib
    

I've used and been impressed by both. Starlight and Palantir may fit your
needs, but those tend closer to enterprise applications -- I get the sense
you're searching for something lighter.

Some more suggestions:

 _DAVIX_ [D] A live CD distribution containing lots of visualization tools,
from capture to parsing to presentation.

 _SecViz_ and their "graph exchange" [0] -- lots of pictures of various
datasets, usually with details of how it was produced.

Personally, I found most packages too restrictive and fell back on Processing:
<http://processing.org> A project out of the MIT Media Lab, it's a generic
graphics framework you can (among other things) produce 2D time series graphs
in. It's still tedious, but tedium is traded for control.

Some of my projects in Processing:

<http://jjguy.com/som/> \- Self Organizing Maps

<http://jjguy.com/life/> \- Conway's Game of Life

I don't have a great time series write-up posted, but I've been working with
Robert May's population model recently. [1] You can find the source at [2] and
example output image at [3].

[D] <http://secviz.org/content/the-davix-live-cd>

[0] <http://secviz.org/category/image-galleries/graph-exchange>

[1]
[http://en.wikipedia.org/wiki/Logistic_function#In_ecology:_m...](http://en.wikipedia.org/wiki/Logistic_function#In_ecology:_modeling_population_growth)

[2] <http://jjguy.com/populationModel.pde>

[3] <http://jjguy.com/normal.png>

------
kalendae
R <http://www.r-project.org/> i would consider it to have a high learning
curve tho.

~~~
hazexp
I personally didn't find the learning curve to be that high. Perhaps because I
was in a course learning taught by one of the founders of the R Project ;).

The lecture slides he created were great, and I recommend anyone that is
interested in R to have a look at them:
<http://www.stat.auckland.ac.nz/~stat380/?Lecture_Slides>

------
beambot
Since you're already using Python, I'd highly recommend looking at Matplotlib
(pylab). We've used it to great utility in the past.

------
apu
I've been building up a javascript library to do this, as visualizizing
results/data takes up more than half my research time.

Unfortunately, this library is nowhere close to ready to be released, but I
can describe roughly the architecture, in case it's useful to you.

I take all my data and write a python script which will dump it to a .JSON
file. This includes the raw data/results, names of different fields, groups of
fields to toggle on/off together, and what kinds of visualizations I want to
use with a given set of data (view as numbers, as bar graph, as line graph, as
linear time-series data, as color-coded images, etc.)

Then I have a standard html + javascript file in which I simply load in this
JSON file. Because I've prespecified the format of the JSON file (I.e., what
fields it has and how the data is stored), I only need to customize a few
functions to display results. Things like filtering data, searching and
sorting I get "for free", since they're in the library.

The main advantages:

\- Interactive browsing of data in various formats, all in the web browser,
with no plugins etc. required.

\- Can be viewed locally or across the network

\- No need for a server to be running

\- Processing done on each local computer, as opposed to on some server

\- HTML 5 is now good enough (just barely) to offer all the kinds of
interaction I require

\- Most of the code is in my standard library for visualizations, and so the
time to create a new visualization for a new set of data is quite small.

Drawbacks:

\- HTML 5 is still slow

\- Can't do very advanced stuff yet, without writing a lot of custom code. On
the otherhand, using jQuery + jQuery UI, it's very easy to make things
draggable, for example, to compare various things side-by-side or even on top
of each other (with transparancy)

\- No server, so can't "save" complicated settings or parameters (although
cookies help).

Anyway, in case you end up building something of your own, hopefully some of
these ideas might help.

------
araneae
Flot is super cool but probably a bit off-topic for your needs
<http://code.google.com/p/flot/>

Every time I try to use R I just say "fuck it" and use Matlab. Matlab is
pretty flexible w.r.t. manipulating data and there's a lot of tools to deal
specifically with time series analysis
[http://www.mathworks.com/access/helpdesk/help/techdoc/data_a...](http://www.mathworks.com/access/helpdesk/help/techdoc/data_analysis/brenonn.html)

The open source version of Matlab is Octave but they aren't really comparable.
But if you're dead set on open source, that's what I'd recommend.

If you want the flexibility of Matlab and the prettiness and manipulation
ability of flot, Mathematica produces some pretty elegant figures, but I hate
the notebook interface. Yes, I know, you can use a command line interface with
it but the UI for Matlab is actually useful and awesome (like being able to
see the contents of all the objects you create.)

------
elblanco
Depends on the kinda of visualization you are talking about.

I've worked extensively with tools like: Starlight
<http://www.futurepointsystms.com> Palantir (mentioned previously)
<http://www.palantirtech.com> I2's tools <http://www.i2inc.com/> and others.

There's also scientific data visualization, for looking at things like heat
dissipation in an engine...

But it sounds like you are looking for something like Matlab.

------
hotpockets
Octave? (sort of an open source matlab)

------
fizx
Look into rrdtool.

------
hyperbovine
Ggobi

------
keefe
mathematica

