

Pandas 0.7.0 released: Python data analysis library - wesm
http://pandas.pydata.org/

======
rch
Pandas is looking very nice in general, and I'm happy to find HDF5 in there
too :)

[http://pandas.pydata.org/pandas-
docs/dev/io.html#hdf5-pytabl...](http://pandas.pydata.org/pandas-
docs/dev/io.html#hdf5-pytables)

------
joelthelion
Ooooh, this is really cool! R is nice, but switching to it is a pain when
working in python.

Together with scikits.learn, this could prove really useful in machine
learning and data analysis projects.

------
acslater00
Pandas is literally in my top 3 favorite open-source projects. I use v0.6x
regularly and it is absolutely fantastic. Highly recommend. Can't wait to try
some of these new features as well.

~~~
dshah
I'm curious: What are your other two favorites?

------
radikalus
I'm not sure if I was starting fresh I'd go with R anymore -- it's so hard to
leave R when you've got a toolkit of 50+ packages you need though. =(

------
coda_
Wes, was just on your blog and see you are also into data visualization.
Wondering if you have any recommendations for web based charting tools? I've
used flot (jquery plugin), but looking for alternatives. Thanks!

~~~
wesm
Very interested in d3 integration. Some people
(<http://github.com/mikedewar/D3py>) have already started working in that
direction. The IPython HTML notebook makes JavaScript visualization combined
with pandas a very attractive option going forward, especially if you can come
up with a way to have an interactive plot with backend computations being
handled by pandas. pandas currently does not emit JSON; I would live to adapt
UltraJSON or another library to turn DataFrame objects into JSON very fast and
efficiently.

------
wildmXranat
I tried to find information on how fast the operations are in Pandas, but
couldn't see any numbers. Does anybody have opinions about that aspect?

~~~
wesm
I've written quite a bit about performance on my blog:
<http://blog.wesmckinney.com>. The historical (v)benchmarks page is a good
resource (but doesn't compare to any other libraries):
<http://pandas.pydata.org/pandas-docs/vbench/>

~~~
hhimanshu
what are you using to display code on your blog, it's really nice!

~~~
wesm
Recent posts use the Crayon syntax highlighter for Wordpress. Though I'm
thinking about ditching WP eventually for a workflow more like
<http://jseabold.net/blog/2012/01/project-genesis.html>.

------
gourneau
Wes is a rockstar

------
regularfry
> NaN (not a number) is the standard missing data marker used in pandas

That's just _wrong_.

~~~
wesm
Is it? For lack of NA bit patterns in NumPy it's either use a special value
(like NaN) or use masked arrays. If you choose the latter, I say to you: good
luck.

~~~
regularfry
NaN as commonly used already has a meaning: it's the result of a calculation
whose inputs were _known_ , and the calculation is known to be undefined for
those specific inputs. "Unknown" means something entirely different: that we
don't know what the inputs were, but _if we did_ they are unlikely to have
been NaN.

Conflating the two concepts means you can't tell the difference given the
result set. It's just a happy accident that "unknown" and NaN have identical
propagation rules, but that doesn't mean that it's safe to use one in place of
the other. Reading up on it, it looks like Octave and Matlab can treat NaN as
"missing data", though, so I guess there's a certain "industry standard
behaviour" to follow so as not to surprise users, but it's still less than
ideal.

In an ideal world, we could define an explicit "missing data" quiet NaN which
would have a distinct visual representation - I suspect this is doable with
access to the float exponent bits, but I don't know how Python could take
advantage of it.

------
grogenaut
It'd be better if you lost the ragging on c#/c++/java and went positive by
accentuating the great abilities of interpreted languages like python for
rapid prototyping, which is what you are doing when you are iteratively
improving analysis.

FYI it's fun to hear an academic ragging on "unmaintainable code".

~~~
wesm
Who's the academic you're referring to (if it's me, you're misinformed)?

One of the strengths of Python is that you can use it to build critical
production systems (which I've done for many years in the financial industry).
You come up against a lot of people who think "Java/C++/C# are the only
suitable systems languages".

~~~
grogenaut
I use python at work heavily. I also equally use Java, C++, C#, Ruby, and
shell scripting. I use what's good for what I'm trying to do, and I like
having several choices.

I'm merely pointing out that the language bashing is not productive. The
writeup should point out the positives and stop trying to turn the differences
between languages into a parallel of state of American political discourse.

------
atron306
Pandas + statsmodels = #rstats domination. Really like where this project is
going.

------
orp
Is there anything similar to Pandas that runs on the JVM?

~~~
ogrisel
Incanter: <http://incanter.org/>

------
zentrus
Is there anything close to this for Ruby?

