Believe me, there's a lot of plumbing moving stuff from point A to B and dealing with poop ("dirty data" is the industry euphemism) in the data engineering and data analyst space.
In my more analytic moments I try to convince myself that data engineering and analysis is like chemical refining, creating useful byproducts out of raw liquids, but in my cynical moments, the plumbing metaphors for it are just so much more evocative.
What will make any function that uses floating point numbers mindblowing complex. But there's probably an easier way by creating some transformation from (Integer -> a) to (F64 -> a) so that only the transformation gets complex.
Anyway, there are many reasons people don't write actual programs this way.
I have not worked with Spark, but I have used Athena/Trino and BigQuery extensively.
For me I don't really understand the hype around Polars, other than that it fixes some annoying issues with the Pandas API by sacrificing backwards compatibility.
With a single node engine you have a ceiling how good it can get.
With Spark/Athena/BigQuery the sky is the limit. It is such a freedom to not be limited by available RAM or CPU. They just scale to what they need. Some queryies squeeze in CPU-days in just a few minutes.
I'm using both Spark and polars, to me the appeal of polars is additionally it is also much faster and easier to set up.
Spark is great if you have large datasets since you can easily scale as you said. But if the dataset is small-ish (<50 million rows) you hit a lower bound in Spark in terms of how fast the job can run. Even if the job is super simple it take 1-2 minutes. Polars on the other hand is almost instantaneous (< 1 second). Doesn't sound like much but to me makes a huge difference when iterating on solutions.
Google Sheets worked fine for me. The UI is intuitive and stripped down. And it has much better networking/collaboration support compared to Microsoft that has it bolted on.
I may be being silly about this, and I probably should at least try how well google sheets works for me as a regular part of my workflow, but I'd really prefer an actual local piece of software than a web app.
There's nothing silly about wanting to avoid the sandcastle-built-on-quicksand world of web applications.
Sadly, the world of native apps, at least in the commercial space, seems to be drifting away from the tenants of stability and user control that the space used to exemplify. Excel 365 (or whatever the hell MS wants to call it) randomly auto-updates itself without warning or confirmation.
The finance department in a previous job rather undramatically moved over to Google Sheets from Excel after seeing the befits of the collaboration/online environment.
They got some help by me for moving out the heavy stuff to SQL/BigQuery but that was also for the better. BigQuery and Sheets integrate very well these days so they could even use a Google sheet as a UI for the queries. Rows and columns, much better than any other web UI.
I think Google Sheets is nearly miraculous, with all that it can do as a web app, but it's amazing to me that anybody actually seriously using Excel in a professional capacity could migrate to it. They're not in the same weight class.
I've used both for a long time (relatively so for Sheets given its age). The online aspect aside, Sheets does tend to work for 90% of my use cases. But that last 10%? It doesn't come close to what I can accomplish with Excel. Exotic dimensions in data, frankly better pivot table support, and perhaps most maddening, the much faster macros....
I created a simple function/macro once in Sheets to help me indicate when a group of rows were done; I had to watch my 3-5k sheet painfully iterate through every line and assign new colors. The script system even gave up at some point after I closed my laptop(? So much for online!..), and I had to manually restart it.
Could you elaborate? Was it from total exhaustion? Were you taught a special technique to help you sleep? Or was it from knowing you MUST get your rest or you'll suffer the entire next day?
I think an important part of it is that is made me stop worrying. Sleep will come to me eventually and meanwhile my body will continue working. Maybe my brain does not work optimally, but well enough.
Sleep is my friend.
(I was not anything like a navy seal but a commander of ~25 soldiers).
Look around, they are everywhere.
https://maps.app.goo.gl/BxEh3gBhooeH9Pfe7?g_st=ac
reply