Hacker News new | past | comments | ask | show | jobs | submit login
Pandas (Python data analysis tookit) version 0.8.0 released (pydata.org)
97 points by wesm on June 29, 2012 | hide | past | web | favorite | 16 comments

wesm, I can't thank you enough for this library.

I own and operate a quantitative finance business. Pandas (+ numpy) has been a godsend. Not only do I not have to pay for matlab licenses, but even the less experience programmers on my team have been insanely productive.

Thank you.

What do you see the advantages of Python/pandas/numpy/etc over matlab as? Do you use any toolboxes?

Basically, pandas, numpy, and matplotlib give me everything I could have wanted out of matlab from a numerical capabilities and graphing perspective.

In my opinion, matlab's excellent object inspection and debugging capabilities can be replaced with strict testing standards in your code-base.

On top of that, I get to use a whole slew of libraries that are non mathematically related -- frameworks for web services, accessing ftp servers, sending e-mails -- a lot of automated "utility" stuff.

And it is all "free". Fantastic.

not the person you're responding to, however I also made the change in the financial industry for a variety of reasons.

but the main one for me, was that python does non-math things much better than matlab. Since python is a general purpose language you can go from analysis to production application much faster, whereas with matlab it usually involved getting a software developer to rewrite it in java.

We used to take our python analysis code, wrap it up in a web app, and then use that to server risk information to traders, and it was quite easy to do so.

Sounds like an interesting business. The link in your profile seems to be invalid, fyi.

It was my personal site that I recently "scrubbed" off the web. I updated my profile for my business site. Thanks for the reminder.

Pandas is outstanding, I love it.

This may not be the place for this, but...

I just built pandas 0.8.0 and it would not build with MinGW 0.5. The problem is -mno-cygwin is no longer recognized by MinGW's gcc. My solution was to edit distutils/cygwincompiler.py and remove references to -mno-cygwin. THIS IS NOT PANDAS'S FAULT! I just thought I'd point it out in case other people run into the same problem.

Wes: Thanks!

Anyone reading this who wants to get started with Pandas: The early release of "Python for Data Analysis" (http://shop.oreilly.com/product/0636920023784.do) is already very helpful.

This release looks like a great upgrade to a great library. The idea of using python as my "one language" is really appealing, but I still find myself falling back on R pretty consistently when it comes to data manipulation/analysis. As pandas matures I see myself doing this less and less.

Thanks Wes and everyone else who pitched in!

I'd be interested to see some of your R use cases where you perceive that things could be improved in pandas; a year ago there were lots of things you couldn't do, but a lot has changed :) Nowadays, the tables have turned and there are lots of things you can do in pandas that are nearly impossible to do in a non-kludgy way in R (particularly many things with hierarchical indexing).

Wes - I think that the reason I would keep using R over pandas is all the packages in the R universe. Which I suppose is the reason why you would use pandas over R if you had more experience with python.

E.g. ggplot2 still seems to be quite a bit better than matplotlib. Also for the random data examination/sketching I absolutely love rstudio due to it's integrated help/plotting/file browsing.

I guess it's been about 9 months since I really put pandas through its paces, so I'll take another look. IIRC, last time I really tried to do a project with pandas, I found some typing/data transformation issues to be the things that held me back most. If I find some time in the next couple of weeks I'll try to put together some concrete examples.

Can someone give a rundown about how Pandas compares with numpy ?

It's intended to be a library for working with data stored in multi dimensional in-memory tables. Think loading a .csv file or relational table into memory, performing some transformations, adding columns, merging with other tables, and grouping data, filtering, sorting, all while handling missing data gracefully.

Maybe I would describe it as combing a spreadsheet with SQL data transformation capabilities - but better.

It requires numpy because it uses ndarray as it's underlying data structure and you can also use many/most? of the numpy data analysis functions.

It requires numpy. Check out the linked article...

Awesome to see this release. Great stuff for timeseries data analysis among other things. Thanks Pandas!

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact