Hacker News new | past | comments | ask | show | jobs | submit login

I have not used spark, but I have written a lot of sql, polars and pandas. I think much more in terms of sql when I write polars than pandas. Do you have any examples of what you are referring to?



The examples I'm referring to are in that page I linked to in my comment above.

Here's one of them:

  # Polars
  df.select(
    pl.col("foo").sort().head(2),
    pl.col("bar").sort(descending=True).head(2),
  )
In SQL and Spark DataFrames, it doesn't make sense to sort columns of the same table independently like this and then just juxtapose them together. It's in fact very awkward to do something like this with either of those interfaces, which you can see in the equivalent Spark code on that page. SQL will be similarly awkward.

But in Polars (and maybe in Pandas too) you can do this easily, and I'm not sure why. There is something qualitatively different about the Polars DataFrame that makes this possible.


Because it's column based vs row based. Definitely can be a bit more of a footgun ("with great power comes great responsibility").

Long story short, the memory model operates on columns of data as opposed to rows, so fields in a conceptual "row" aren't necessarily an atomic unit.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: