Hacker News new | past | comments | ask | show | jobs | submit login

> this is pretty much nuke-from-orbit

That analogy might imply overkill, thus highlighting the tactical advantages of the SFrame approach in processing a month's worth of 1-10GB daily-generated SQLite files, for instance.




Well no. It means the same situation as git vs mercurial. One may be better than the other (as you mentioned), but it really doesn't matter.

If Pandas and other high profile products are endorsing it (and may adopt it), it's going to be very hard for 99% of people to choose something else.


Python's Dask out of core dataframe can also do that.


Dasks' out of core dataframes are just a thin wrapper around pandas dataframes (aided by the recent improvement in pandas to release the GIL on a bunch of operations)


Uh, no they are not. They lazy- scale pandas to on disk and distributed files.

http://dask.pydata.org/en/latest/dataframe.html

"Dask dataframes look and feel like pandas dataframes, but operate on datasets larger than memory using multiple threads."

http://blaze.pydata.org/blog/2015/09/08/reddit-comments/


Why doesn't Pandas have anything to save the entire workspace to disk (like .RData). There are all these cool file formats like Castra, HDF5, even the vanilla pickle - but I don't see anything with a one shot save of the workspace (something like Dill)

Is this an antipattern for Pandas?


You haven't refuted anything I said. Internally the dask dataframe operations sit on top of pandas dataframes. All dask does is automatically handle the chunking into in-memory pandas dataframes and interpret dask workflows as a series of pandas operations.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: