More

gdubya · 2025-01-18T19:08:29 1737227309

Hilbert curves are used in modern data lakehouse storage optimisation techniques, such as Databricks' "liquid clustering" [1]. This can replace the need for more traditional "hive-style" partitioning, in which the data files are partitioned based on a folder structure (e.g. `mydatafiles/YYYY/MM/DD/`).

1. https://docs.databricks.com/en/delta/clustering.html

gdubya · on Nov 3, 2023

I wanted to build a Windows container image but I really did not want to install Docker Desktop. After some digging around I found my way to the Docker server / client binaries for Windows page that allowed me to do this: https://docs.docker.com/engine/install/binaries/#install-ser...

bafe · on Nov 4, 2023

Thanks, that's good to keep in mind if I ever need to run docker engine on windows!

gdubya · on June 27, 2023

Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game!

gdubya · on June 12, 2023

Like Mastodon to Twitter? I like the idea, but it's not really taking off, is it?

zouhair · on June 12, 2023

As long as Reddit's powerusers are still hoping for a retraction it won't, but if Reddit keep at it, I can see the move happening, especially for the big (over 500K) subreddits.

goykasi · on June 12, 2023

Lemmy is falling over with less than 100k users. How are they magically going to handle 5x, 10x, 100x? The platform is maybe interesting, but the devs need to own up to the fact that they dont have a scaling story -- written in rust is not a valid answer. Instances are already deciding not to federate, theres no network-wide search, signup flows sound ridiculous (pick server and write an essay? yah no), how do i move servers and what happens when an instance disappears?

In the one of the first big Apollo threads, one of the devs was spamming Lemmy throughout the discussion as the next great reddit replacement: easy, federated, more performant than reddit. I guess he almost got 1 out of 3 correct.

slumberlust · on June 12, 2023

This take seems short sided, at best. Because we don't have a perfect solution with zero friction to swap to, we shouldn't try? Most new tech goes through growing pains, including reddit back in the day.

It's fine to be critical, but it's important to foster an environment that encourages growth and competition. Have electric cars sucked for the last decade? Yes, but we should still invest and promote them as they offer a better future.

gdubya · on May 30, 2023

This is interesting... possibly a move by Databricks to try and build on their "data lakehouse" concept to counter the recent "Fabric platform" announcements at MS Build.

Databricks coined the "Delta lake" concept and are still (just about) leading the way, but Fabric has the potential from MS to take away that marketshare. Databricks need to improve their "serverless SQL" offering, and add a serious "data warehouse" component alongside the lake.

Scubabear68 · on May 30, 2023

Of all the stupid tech terms in the world, for some reason “data lakehouse” grates horribly in my head every time I hear it.

fshbbdssbbgdd · on May 31, 2023

I hope the marketer who came up with it got the lakehouse they were dreaming of.

vforgione · on May 30, 2023

Fabric may eat some of the descriptive analytics portion of Databricks’ lunch, but for core data engineering workflows there is nothing in the Fabric—or Synapse or Power BI—ecosystem that comes close.

There are other fatal flaws to the Spark implementation in Synapse that I think carried over to Fabric. Worst one is the clunkiness/inability to run multiple notebooks concurrently on a cluster.

itsrobforreal · on May 30, 2023

I'm perusing the Fabric docs and they are using Delta Lake, Spark and Azure Databricks as part of that solution

gdubya · on May 31, 2023

Fabric does not use Databricks, but both Databricks and Fabric rely heavily on Delta. Let's just hope that they remain compatible.

itsrobforreal · on May 31, 2023

Ah, so we're in the "extend" portion of the process

gdubya · on May 19, 2023

Does it work with LocalAI [1] if you change the openai.api_base value to http://localhost:8080/ ?

1. https://github.com/go-skynet/LocalAI

gdubya · on April 11, 2023

gdubya · on Oct 29, 2021

Windows 11 is the perfect time to switch to Linux (especially for first-timers), using WSL2. Then, once you're familiar with Linux, it's a shorter jump to replace Windows completely (and appreciate how fast Linux can be without WSL)

gdubya · on Oct 3, 2021

"it ain't great, but it could be a lot worse..."

gdubya · on Aug 22, 2021

Oh, this has been submitted before, sorry!

toomuchtodo · on Aug 22, 2021

Don’t apologize! It’s another chance for folks to see it. Thanks for sharing.