Hacker News new | past | comments | ask | show | jobs | submit | dancrystalbeach's comments login

When it comes to building modern Lake House architecture, we often get stuck in the past, doing the same old things time after time. We are human; we are lemmings; it’s just the trap we fall into. Usually, that pit we fall into is called Spark. Now, don’t get me wrong; I love Spark. We couldn’t have what we have today in terms of Data Platforms if it wasn’t for Apache Spark.


DuckDB vs Polars for Unity Catalog integration with Delta Lake.


The best Data Quality tool ever created for Pyspark.


You know, for all the hoards of content, books, and videos produced in the “Data Space” over the last few years, famous or others, it seems I find there are volumes of information on the pieces and parts of working in Data. It could be Data Quality, Data Modeling, Data Pipelines, Data Storage, Compute, and the list goes on. I found this to be a problem as I was growing in my “Data” career over the decades.


DuckDB to process remote JSON files! Amazing!


An awesome and simple Data Stack using AWS Lambda and DuckDB, along with Delta Lake, to show the Lake House architecture doesn't have to be complex.


S3 Tables are not a Lake House, they are also not commodity file storage on which you can build your Lake House. They are an Amazon table lock-in.


First, anyone who says there is NOT a Lake House battle brewing and bubbling most likely has an agenda tied to one or other of those dueling sides.


AWS recently announced S3 Tables, and the interwebs blew up. But is it just fools gold?


Sometimes I find myself lying in my sunroom, staring out the window in the blue sky above me while the sun plays on the maple tree, empty of most all but a few red leaves … wondering what else I can do to make the already angry readers of my babbling even more angry.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: