Hilbert curves are used in modern data lakehouse storage optimisation techniques, such as Databricks' "liquid clustering" [1]. This can replace the need for more traditional "hive-style" partitioning, in which the data files are partitioned based on a folder structure (e.g. `mydatafiles/YYYY/MM/DD/`).
I wanted to build a Windows container image but I really did not want to install Docker Desktop. After some digging around I found my way to the Docker server / client binaries for Windows page that allowed me to do this: https://docs.docker.com/engine/install/binaries/#install-ser...
As long as Reddit's powerusers are still hoping for a retraction it won't, but if Reddit keep at it, I can see the move happening, especially for the big (over 500K) subreddits.
Lemmy is falling over with less than 100k users. How are they magically going to handle 5x, 10x, 100x? The platform is maybe interesting, but the devs need to own up to the fact that they dont have a scaling story -- written in rust is not a valid answer. Instances are already deciding not to federate, theres no network-wide search, signup flows sound ridiculous (pick server and write an essay? yah no), how do i move servers and what happens when an instance disappears?
In the one of the first big Apollo threads, one of the devs was spamming Lemmy throughout the discussion as the next great reddit replacement: easy, federated, more performant than reddit. I guess he almost got 1 out of 3 correct.
This take seems short sided, at best. Because we don't have a perfect solution with zero friction to swap to, we shouldn't try? Most new tech goes through growing pains, including reddit back in the day.
It's fine to be critical, but it's important to foster an environment that encourages growth and competition. Have electric cars sucked for the last decade? Yes, but we should still invest and promote them as they offer a better future.
This is interesting... possibly a move by Databricks to try and build on their "data lakehouse" concept to counter the recent "Fabric platform" announcements at MS Build.
Databricks coined the "Delta lake" concept and are still (just about) leading the way, but Fabric has the potential from MS to take away that marketshare. Databricks need to improve their "serverless SQL" offering, and add a serious "data warehouse" component alongside the lake.
Fabric may eat some of the descriptive analytics portion of Databricks’ lunch, but for core data engineering workflows there is nothing in the Fabric—or Synapse or Power BI—ecosystem that comes close.
There are other fatal flaws to the Spark implementation in Synapse that I think carried over to Fabric. Worst one is the clunkiness/inability to run multiple notebooks concurrently on a cluster.
Windows 11 is the perfect time to switch to Linux (especially for first-timers), using WSL2. Then, once you're familiar with Linux, it's a shorter jump to replace Windows completely (and appreciate how fast Linux can be without WSL)
1. https://docs.databricks.com/en/delta/clustering.html
reply