I have not used spark, but I have written a lot of sql, polars and pandas. I thi...

nchammas · on Feb 14, 2024

The examples I'm referring to are in that page I linked to in my comment above.

Here's one of them:

  # Polars
  df.select(
    pl.col("foo").sort().head(2),
    pl.col("bar").sort(descending=True).head(2),
  )

In SQL and Spark DataFrames, it doesn't make sense to sort columns of the same table independently like this and then just juxtapose them together. It's in fact very awkward to do something like this with either of those interfaces, which you can see in the equivalent Spark code on that page. SQL will be similarly awkward.

But in Polars (and maybe in Pandas too) you can do this easily, and I'm not sure why. There is something qualitatively different about the Polars DataFrame that makes this possible.

theLiminator · on Feb 17, 2024

Because it's column based vs row based. Definitely can be a bit more of a footgun ("with great power comes great responsibility").

Long story short, the memory model operates on columns of data as opposed to rows, so fields in a conceptual "row" aren't necessarily an atomic unit.