Hacker News new | past | comments | ask | show | jobs | submit login

Maybe someone can tell me the benefit but why use Pandas rather then a real database? Pandas seems like a non-standard, in memory, python specific database system that has no indexing, caching, or persistence layer aside from dumping to flat files. It's an RDBMS without the Relational bit.



I don't think you have looked at Pandas that closely. For example, Pandas has multiple ways to persist besides a flat file, including to SQLite and HDF5. - http://pandas.pydata.org/pandas-docs/stable/io.html

It's also aware of the relational data model. The 10 minute intro at http://pandas.pydata.org/pandas-docs/stable/10min.html?highl... says:

"pandas provides various facilities for easily combining together Series, DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations."

and http://pandas.pydata.org/pandas-docs/stable/comparison_with_... gives a mapping of basic SQL operations to Pandas.

There is also some caching. The documentation mentions things like "A large range of dates for various offsets are pre-computed and cached under the hood in order to make generating subsequent date ranges very fast (just have to grab a slice)" and "Bug in Series update where the parent frame is not updating its cache based on changes".

Perhaps you can think of it as an in-memory relational database without SQL or a declarative language but instead has an imperative API for Python, containing a large number of helper functions not in most RDMSes but which are important in data analysis. Think of it as a database engine where the user has control over the internal memory layout so C extension and other code (a "cartridge" or "datablade", if you will) can work with it directly.

Supposed you want to read a CSV into a data frame, compute the difference between column "X" and "Y", and show those differences as a histogram. That's one line of Pandas. How do you do that in a RDBMS?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: