Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Good tool to visualize data?
20 points by hyuen on Nov 8, 2009 | hide | past | favorite | 11 comments
Hi all, I am in the need to visualize some time series data. The problem I am having is that I want to be able to impose restrictions in an ad-hoc way, such as delimiting time, filtering out some categories, and then get some statistics on that data, such as the mean, a histogram, or other things.

What I am doing right now is a bunch of gnuplot scripts together with python/c programs to grab the data, but this is getting tedious.

Is there any open source tool/library that does something similar?

Thanks




The suggestions below are solid:

  R 
  Python + matplotlib
I've used and been impressed by both. Starlight and Palantir may fit your needs, but those tend closer to enterprise applications -- I get the sense you're searching for something lighter.

Some more suggestions:

DAVIX [D] A live CD distribution containing lots of visualization tools, from capture to parsing to presentation.

SecViz and their "graph exchange" [0] -- lots of pictures of various datasets, usually with details of how it was produced.

Personally, I found most packages too restrictive and fell back on Processing: http://processing.org A project out of the MIT Media Lab, it's a generic graphics framework you can (among other things) produce 2D time series graphs in. It's still tedious, but tedium is traded for control.

Some of my projects in Processing:

http://jjguy.com/som/ - Self Organizing Maps

http://jjguy.com/life/ - Conway's Game of Life

I don't have a great time series write-up posted, but I've been working with Robert May's population model recently. [1] You can find the source at [2] and example output image at [3].

[D] http://secviz.org/content/the-davix-live-cd

[0] http://secviz.org/category/image-galleries/graph-exchange

[1] http://en.wikipedia.org/wiki/Logistic_function#In_ecology:_m...

[2] http://jjguy.com/populationModel.pde

[3] http://jjguy.com/normal.png


R http://www.r-project.org/ i would consider it to have a high learning curve tho.


I personally didn't find the learning curve to be that high. Perhaps because I was in a course learning taught by one of the founders of the R Project ;).

The lecture slides he created were great, and I recommend anyone that is interested in R to have a look at them: http://www.stat.auckland.ac.nz/~stat380/?Lecture_Slides


Since you're already using Python, I'd highly recommend looking at Matplotlib (pylab). We've used it to great utility in the past.


Flot is super cool but probably a bit off-topic for your needs http://code.google.com/p/flot/

Every time I try to use R I just say "fuck it" and use Matlab. Matlab is pretty flexible w.r.t. manipulating data and there's a lot of tools to deal specifically with time series analysis http://www.mathworks.com/access/helpdesk/help/techdoc/data_a...

The open source version of Matlab is Octave but they aren't really comparable. But if you're dead set on open source, that's what I'd recommend.

If you want the flexibility of Matlab and the prettiness and manipulation ability of flot, Mathematica produces some pretty elegant figures, but I hate the notebook interface. Yes, I know, you can use a command line interface with it but the UI for Matlab is actually useful and awesome (like being able to see the contents of all the objects you create.)


I've been building up a javascript library to do this, as visualizizing results/data takes up more than half my research time.

Unfortunately, this library is nowhere close to ready to be released, but I can describe roughly the architecture, in case it's useful to you.

I take all my data and write a python script which will dump it to a .JSON file. This includes the raw data/results, names of different fields, groups of fields to toggle on/off together, and what kinds of visualizations I want to use with a given set of data (view as numbers, as bar graph, as line graph, as linear time-series data, as color-coded images, etc.)

Then I have a standard html + javascript file in which I simply load in this JSON file. Because I've prespecified the format of the JSON file (I.e., what fields it has and how the data is stored), I only need to customize a few functions to display results. Things like filtering data, searching and sorting I get "for free", since they're in the library.

The main advantages:

- Interactive browsing of data in various formats, all in the web browser, with no plugins etc. required.

- Can be viewed locally or across the network

- No need for a server to be running

- Processing done on each local computer, as opposed to on some server

- HTML 5 is now good enough (just barely) to offer all the kinds of interaction I require

- Most of the code is in my standard library for visualizations, and so the time to create a new visualization for a new set of data is quite small.

Drawbacks:

- HTML 5 is still slow

- Can't do very advanced stuff yet, without writing a lot of custom code. On the otherhand, using jQuery + jQuery UI, it's very easy to make things draggable, for example, to compare various things side-by-side or even on top of each other (with transparancy)

- No server, so can't "save" complicated settings or parameters (although cookies help).

Anyway, in case you end up building something of your own, hopefully some of these ideas might help.


Depends on the kinda of visualization you are talking about.

I've worked extensively with tools like: Starlight http://www.futurepointsystms.com Palantir (mentioned previously) http://www.palantirtech.com I2's tools http://www.i2inc.com/ and others.

There's also scientific data visualization, for looking at things like heat dissipation in an engine...

But it sounds like you are looking for something like Matlab.


Octave? (sort of an open source matlab)


Ggobi


Look into rrdtool.


mathematica




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: