Hacker News new | past | comments | ask | show | jobs | submit login
Garnet – A new remote cache-store from Microsoft Research (github.com/microsoft)
370 points by saganus 11 months ago | hide | past | favorite | 118 comments



From the benchmark performance charts(https://microsoft.github.io/garnet/docs/benchmarking/results...), the throughput of the GET command exceeds that of Dragonfly by more than tenfold. While 50% latency is slightly higher than Dragonfly, the 99th percentile is slightly lower than Dragonfly. Both the throughput and latency of Garnet and Dragonfly are far better than Redis, indicating that Redis may require a significant performance optimization.


> Redis may require a significant performance optimization.

Redis is single-threaded, it’s simple and effective. I’m not sure it needs optimization, and we have 3 alternatives here.

Garnet however is the first alternative to actually outperform Redis at both low and high levels of concurrency, which is remarkable. I can’t wait to try it out.


Redis for 99% of the intended use-cases and companies will be just fine. It has always been rock solid when we used it. It was the opposite of DNS: it was never a problem.


I really hate that meme. DNS is the most robust system that actually can achieve 100% uptime (even though pains me to say it as it originates from USC, and I'm a Bruin ;). Every time you type an URL on the internet you're using it.

DNS can fail when someone who is managing it have no idea what the hell they are doing and abuse it to do things it wasn't meant to do.


Could you expand on "It was the opposite of DNS: it was never a problem."? Feel like I'm missing some interesting history here.


It’s just a common saying “the problem is always DNS”. When you have a weird networking issue, the problem always seems to be DNS. Either some misconfigured DNS entry or DHCP giving you the wrong server address, etc…

And then the problem is confounded by the fact that, ironically, DNS works so well that we don’t think of it as a primary point of failure. So inevitably, when there is a DNS problem, it’s the last thing we check. This just reinforces the idea that the problem is always DNS… because in those long, hard to troubleshoot instances… the problem was DNS.


DNS, BGP, or branch prediction. Pick three.


Stealing from the greats, "DNS, BGP, off-by-one errors, or branch prediction. Pick three."


It was off-by-one, indeed.


I just faced an issue with redis this week. It was causing my Javascript heap memory to go bust. I thought it was a data leak in my code but it turns out the redis client was filling it up and I fixed it by simply adding a static delay every 1 million set operations so that the garbage collector had enough time to do its job. (I was stress testing for a total of 6 million set operations)


That's an issue with JS and/or client library though.


It's weird since DNS is one of the most rock solid systems we have, with redundancy at every level.

1. Clients query a list of servers (IPs) and handle failover when you don't quickly get a reply.

2. Most of those servers at the root and TLD level are actually anycasted from multiple locations globally, so you connect to the closest instance.

3. Those instances are often clusters of physical servers. The big ones have fully redundant networking, so any router or switch failing doesn't take it down. Some run different DNS server software on each physical server, so even software bugs won't take down the whole system.


In my years of doing this sort of thing. I only had it really be DNS once. The issues I usually see blamed on DNS is when DNS is 'abused' to do things like load balancing and the TTL is very small. Then you need goofy things like 'sticky sessions' and what not to work around that.


Assuming you didnt manage connectivity of systems very long or very many. I find it difficult that not being true with you saying it happened "once".


Meme this is referencing should be "It's always DNS config."


Mostly. But it can also be DNS request volume, DNS cache expiry, DNS response times, DNS connection contention, DNS security, DNS error handling, or DNS client configuration.

All variants of ‘DNS is working perfectly, just your expectations of how it will work in your situation are not completely correct’.


See, config. :-)

// wrote what might have been first commercial-use dynamic DNS server for a regional ISP in early 90s, invented an unreasonably effective geo+latency balanced anycast-like DNS for global video delivery network in 00s


isitdns.com


This is lame, it should say No, as you need DNS to load that page :P


People who don't understand networking like to blame DNS for every problem they experience I guess?


> Redis for 99% of the intended use-cases and companies will be just fine.

Except when Windows is the architecture.

This may be a place where Garnet is a good alternative.


Plus in production, with high load, Redis cluster is way more common, which kind of solve single-threaded concern.


I've always found redis cluster to just bring problems with it.


Who would trust microsoft to not shoehorn in some telemetry, or worse?


What surprises me the most is that this project is developed in C#, while Dragonfly is developed in C++, and Redis is in C.


If you look at the store code you see a lot of "unsafe" C# code (i.e pointer manipulation)


Shows how flexible C# is.

You can trade high-level expressiveness for low-level control where needed and you don't have to deal with any FFI to do it.


C# and .NET are both truly underrated in the wider community -- I think -- because of some early snafus. The late adoption of an OSS model, Windows-centric in the .NET Framework days, too dependent on Visual Studio for a very long time.

Nowadays, I think it's probably the most natural language and platform for teams that need to move on from TypeScript rather than Go or Rust given the similar constructs and idioms.


For a while there I had high hopes that one could pull in Java libraries via their JVM interop on top of the Common Language Runtime, but it doesn't seem to have caught on and I'm not in that ecosystem enough to know why. But yes, for the many excellent reasons you cited it means the library ecosystem is nowhere near that of the JVM, and thus I haven't once considered .net for a new project regardless of how much I love C# the language. Well, that, and my experience with the observability and introspection components of the JVM are second to none


    > the library ecosystem
I don't really see many gaps. There tends to be fewer libraries, but the libraries available generally feel more complete and well thought out because users tend to cluster around the known libraries.

Many of the first party libraries are really, really good. EF Core is a prime example of possibly one of the best ORMs on the market right now in terms of productivity, ergonomics, and performance.


> I don't really see many gaps

isn't that the "works on my machine" of this discussion? What's the Apache Tika for .net then? I don't mean, pdf parsing, I don't mean .docx parsing, I mean a framework for interacting with all their supported types <https://tika.apache.org/2.9.1/formats.html> with one surface area?


The entirety of Apache ecosystem pretty much mandates using Java or you are being left out with no or at best second-citizen experience. It's not a fault of every other language, but of that particular ecosystem itself.

There are exceptions in the form of well-written community-maintained libraries, but these exist for minority of Apache projects and vary a lot between languages.

Luckily, there are often (but not always) plenty of alternatives to whatever it is that you seek from Apache.



> https://github.com/KevM/tikaondotnet/releases/tag/v1.17.1 - Apr 3, 2018

You're right, how silly of me, I'll install some rando's 6 year old build of it right away. But in seriousness, that readme did remind me of the thing I was thinking of: http://www.ikvm.net/userguide/ikvmc.html


Maybe your question is the wrong one to ask. My inclination is that a package or library that isn't actively maintained has no utility. If that's the case, then the assumptions underlying that package are perhaps not relevant.

Personally, I'd say it is probably the case that there are better alternatives. Azure Document Intelligence being one of those. Sure, it can't handle the variety of formats, but again, we have to come back to the underlying assumptions: .docx, .pdf, and a few handful of formats probably covers 99% of business use cases. Everything else would be considered niche.


It’s great but you’re locked into visual studio if you want the most out of it (they put 95% of their effort in tooling there). You want to use visual studio?


I work in .NET on the daily on an M1 MBP using primarily VS Code and occasionally Rider.

There's no need for VS for most (any?) .NET workloads these days. That's why I wrote it was an early mis-step. Nowadays, it's easy to do .NET dev on any platform. In fact, we ship our production runtime to AWS t4g (Arm64) instances.


I primarily use JetBrains Rider for my daily tasks, and I agree that while you can use VS Code or even Vim with LSP support for many projects, working with substantial ones such as the runtime or ASP.NET Core development might practically necessitate using Visual Studio. By the way, the Visual Studio Community Edition is available for free for virtually all projects I can think of, mainly for personal, non-commercial, or open-source usage. The license details can be somewhat unclear, but that's my understanding of it.


The debugger is night and day vs visual studio. Not even comparable. Like 1% vs 99%. And there’s other tooling.

It can be done, but it’s so inferior to VS that you might as well use VS.


I assume it's similar to the 1 billion row challenge with java - C# and Java have sufficiently advanced VM's that they can compete with C/CPP performance - assuming you go out of your way to optimize for it.

https://github.com/gunnarmorling/1brc



surprised to see a garbage collected language project (C# for Garnet) beat redis/dragonfly


Not all garbage collected languages are made equal, some of them, like C# and .NET, do provide all the performace knobs that are needed for C++ like coding.

People only have to learn how to use them, instead of placing all garbage collected languages into the same basket.

In this specific case, MSIL and .NET were designed to support C++ as well, and languages like C# and F# do have ways to access those features, and even if some feature isn't exposed at the language grammar level, you can emit the same MSIL that C++/CLI would generate.


There's a series of blog posts by Oren Eini that shows that C# can be sufficiently fast even when you don't really optimize anything. Beating Redis in every benchmark is a whole another level, of course.

https://ayende.com/blog/197412-B/high-performance-net-buildi...


The investment in optimization in CLR or JVM can be huge as they impact millions of applications. While each C / C++ code will have to be hand optimized.

Also limits on number of people in given time who can write optimal C code vs C# will also make managed code better.


Couldn't you just replace the CLR and JVM with llvm and gcc and then replace C with assembly and arrive at the same conclusion.


I don’t think so, because it is the runtime JIT (Just-In-Time) optimizer that is the critical speed advantage that allows the CLR and JVM to beat C and C++.

The inlining of virtual calls is the critical optimization that enables this. Because C/C++ is optimized statically and never at runtime, it is unable to optimize results of function pointer lookups (in C, and thus also virtual calls in C++). However, the JITs can inline through function pointer lookups.

In sufficiently complex programs, where polymorphism is used (i.e. old code that calls new code without knowing about it), this yields an unsurpassed speed advantage. Polymorphism is critical to managing complexity as an application evolves (even the Linux kernel, written in C, uses polymorphism, e.g. see struct file_operations).


Now commenting beyond my expertise, I understand both ecosystems have developed orthogonally, C#/Java assumes code to be higher level and "dumb" and optimization are in library/CLR/JVM. While C/C++ are developed with code at lower level and developer making sure code is optimized.

Now, I could be talking out of ass, haven't done enough C/C++ coding in over a decade.


You can write poorly optimized C#.

And the intel C++ compiler has provided significant performance for decades via optimization without needing to optimize by hand.


I'm not the person you are replying too, but I believe that of course, the pattern holds if you keep shifting it down. I.E. using a faster CPU will speed up all programs running on it, each (already optimized) ASIC has to be optimized further individually.


Plus that it's easier to have unmanaged code in C# than in Java.


Is it possible to have unmanaged code in Java at all… or you mean linking libs and exposing them through Java API in the lang?


Ah, I meant Java code manipulating off-heap memory. For Java I don't think there's a way to mix managed and unmanaged code like you do in C#/C++ CLR.


In C# you usually don't want to mix in C++. Manipulating off-heap memory is indeed easy - nowadays pointers to both object interiors, stack and unmanaged memory can be represented as `ref T` where T is byte, int, etc.

These are then subsequently wrapped by `Span<T>` and `ReadOnlySpan<T>` respectively. This way a span can be a slice of memory that can have any origin:

    var fromStack1 = (stackalloc byte[32]);
    var fromStack2 = (Span<byte>)[0x20, 0x20, 0x20, 0x20];
    var fromHeap = new byte[32].AsSpan();
    unsafe
    {
        var ptr = NativeMemory.Alloc(32);
        var fromMalloc = new Span<byte>(ptr, 32);
        // Don't forget to free :)
    }


Panama has helped into that direction, it isn't like C# though, for feature parity we will eventually need Valhalla, which since they want to keep existing JARs working, it is taking its time.

Ignoring the type systems from Eiffel, Objective-C, Oberon and Modula-3 linage, even though they were inspirations for Java, has been shown to have been a bad decision in hindsight.


True, I still look forward to have Valhala, eventually.


Manual memory management vs garbage collection trades speed of allocation for speed of freeing memory. Freeing memory is expensive in a garbage collected language, but allocation is basically free.

Modern garbage collectors are very good and handle many common use cases rather efficiently, with minimal or zero pauses, such as freeing short-lived objects. Many GC operations are done in parallel threads without stopping the application (you just pay the CPU overhead cost).

Also JITs in both the CLR and the JVM perform optimizations such as escape analysis, which stack-allocate objects that never escape a function’s scope. These objects thus do not have to be GC’d.

So really with a GC’d language, you mostly have to worry about pauses and GC CPU overhead. Most GCs can be tuned for a predictable workload. (A bigger challenge for a GC is a variable workload.)


Correction: JVM implementations perform escape analysis, in particular, because Java does not have structs. .NET does not perform escape analysis for objects and all attempts to experiment with it so far has shown greater impact on compilation speed with little to no profit in allocation rate. However, you will find average allocation traffic much lower in .NET because C# already puts way more data on stack through structs, non-boxing operations (where Java boxes), stack allocated buffers, non-copying slicing with spans and object pooling (which is common in Java too).


Thank you, I missed the stack allocation design doc stating it’s on the roadmap. (https://github.com/dotnet/runtime/blob/main/docs/design/core...)

Appreciate the detail about the stack allocated bits in .NET.


Yeah, it kind of is. There are quite a few of experiments that are conducted to see if they show promise in the prototype form and then are taken further for proper integration if they do.

Unfortunately, object stack allocation was not one of them even though DOTNET_JitObjectStackAllocation configuration knob exists today, enabling it makes zero impact as it almost never kicks in. By the end of the experiment[0], it was concluded that before investing effort in this kind of feature becomes profitable given how a lot of C# code is written, there are many other lower hanging fruits.

To contrast this, in continuation to green threads experiment, a runtime handled tasks experiment[1] which moves async state machine handling from IL emitted by Roslyn to special-cased methods and then handling purely in runtime code has been a massive success and is now being worked on to be integrated in one of the future version of .NET (hopefully 10?)

[0] https://github.com/dotnet/runtime/issues/11192

[1] https://github.com/dotnet/runtimelab/blob/feature/async2-exp...


I think it is rare, but isn't impossible. I remember in the 2000's watching someone do a demo of Java/JVM beating the pants off a C++ application. If I remember correctly, it was something about how the JIT was doing an optimization that you would have had to write assembly in order to optimize it to the same level.


There are at least two reasons I can think of why a JIT language with GC can outperform C/c++:

1. Memory management with a GC often has higher throughput. With the downside that you can have high latency when a garbage collection occurs.

2. The JIT compilation can potentially do a better job of optimizing, because it has information about how the code has been run so far.

It is possible for c or c++ (or rust) to get get the first using alternative memory management strategies, and the second by hand optimizing, and/or using profile guided optimization.


I've also seen GC reliant concurrent algorithms that would require a lot more synchronization to keep track of when to actually free an object. GCed languages can just forget about the object and rely on the GC later to figure out what threads have access to valid objects. C and C++ need to figure out if it's safe to free.


3. The runtime also supports C++, and one emits the same MSIL code sequence that the C++ compiler would generate.

Which is why WASM isn't that great novelty for grey beards.


There were coding competition benchmarks at the time putting javac against the underdeveloped gcc 2.95 of the time. The trick they did with java programs was to allocate a region of memory only once and reuse that whenever they needed more, simulating a stack. Then the programs were benchmarked as hot start and cold start. Hot start timings were used to bench against C++ programs. If the algorithm they used was better, this sometimes resulted in head to head or better performance.


I remember that one too, but this one was not just the hot memory trick, which seems relatively easy. It was entirely JIT based, where the JIT happened to pick a better path than gcc.


Because it's not the language that matters here, the performance comes from the architecture of the storage among other things.

I've seen Go code best Rust one, really when comparing languages you should look at the same implementation, if the design is completely different its not really comparable.


While true, gc tends to affect tail latencies as it can increase variance. Obviously not always the case, but without sufficient planning it's likely. Languages with less stop the world problems tend to trade it for throughput, however if the application is not CPU bound it doesn't matter. The same can happen in a non gc language where a similar lack of allocation planning is done, but it tends to be considered more carefully because those languages really put allocations in your face.


Seems it wins by using more sophisticated data structures & algorithms, instead of "tight code".


Garnet’s storage layer, called Tsavorite, was forked from OSS FASTER, and includes strong database features such as thread scalability, tiered storage support (memory, SSD, and cloud storage), fast non-blocking checkpointing, recovery, operation logging for durability, multi-key transaction support, and better memory management and reuse.

https://www.microsoft.com/en-us/research/blog/introducing-ga...


I did follow FASTER implementation long ago. I thought it was very promising as a persistence library for a proof of concept project with high performance requirements I was working on then. BTW, it looks like both efforts are lead by the same person [1]

[1]: https://github.com/badrishc


Yes, another OSS project by Microsoft


It is interesting to see how Microsoft and the .NET team are building some very impressive hack-your-own-infrastructure project. Yarp is a reverse proxy/API gateway/whatever you do. Now Garnet for memory caches.

Seems they have tons of internal need and are willing to share.


I just wish they had something like this embedded into Azure App Service, so it wouldn't be necessarily to use a remote service for caching.

For reference, something commonly used with IIS for ASP.NET apps was to have an out-of-process "session state" store, so that if the web app process restarted, users wouldn't lose their sessions and have to log in from scratch. Sure, you can put this somewhere central like SQL Server, but then every web page request sits there waiting for the session state to load before it does any processing at all. Session state is also typically locked in some way, which has all sorts of performance issues.

The typical current solution is to use Redis for both caching and session state, and this works... okay-ish. Throughput is high, sure, but Redis is a separate resource in Azure and is stupidly expensive. I really don't want to pay Oracle DB prices for something this simple. It's also a bit of a hassle to wire up.

In this article they talk about 300 microsecond response times, but that's irrelevant in any zone-redundant design because all Azure load balancers use random zone selection. So you'll have a web server picked in a random zone, then it'll contact a cache server in a random zone in turn. That server in turn may not have your key and have to contact yet another random zone to get your cache data! Your traffic ping-pongs between data centres. This introduces about 1-3ms of delays, up to 10x higher than the advertised numbers for Garnet.

The ideal scenario would be something like what Microsoft Service Fabric does: it has a "reliable collections"[1] service that runs locally on each host node and replicates to two other nodes. A web app can always read its cached values from the same physical host. The latency can be single-digit microseconds in some cases, which is thousands of times faster than any naively load balanced external service, no matter how well optimised.

I don't want 30% faster than Redis. I want 3,000x faster.

[1] https://learn.microsoft.com/en-us/azure/service-fabric/servi...


This is (one reason why) I love k8s with cilium. I can set up a service to always go to the local service (or any other routing topology). It is great for any kind of dns or application cache.


Can you point to any resources to read up on this?



Thanks!


A drop-in redis replacement with rather impressive latency and throughput bench figures. Wonder what its like to operate in a non-azure stack in the real world.


Are you sure it is drop-in? I don't see any indication of xstream support.


The title does say that it can work with existing redis clients but it's unclear if they mean full compatibility.



> Garnet being multi-threaded, `MSET` is not atomic. For an atomic version of `MSET`, you would need to express it as a transaction (stored procedure).

I am having trouble understanding this. Why wouldn't they wrap that in a transaction internally for you, and make the command atomic? What other atomicity "gotchas" are there.


Because it would mean a performance penalty for everyone who does not need MSET to be atomic without being able to opt out of that transaction. On the other hand, if you want to be a drop-in replacement for Redis, then this is an issue as Redis guarantees atomicity. Maybe you could have a configuration option that lets you select between compatibility and performance, at least if being a drop-in replacement is a design goal.


Definitely impressive, Microsoft Research comes out with some impressive projects from time to time, must be fun getting paid to do R&D. I wish big companies did more R&D style projects that benefit the industry in general. I sure hope a good company takes over Hashicorp if they're on the market to be bought.


This is excellent news for people who needs to run Redis (or compatible in this case) directly on Microsoft Windows Server, without relying on WSL2. Previously, there was a Redis port available [1] (which is now in archive status) that had memory usage issues (mainly because of memory-mapped files AFAIK) and, of course, is no longer supported.

It's also quite intriguing for me to see it's written in C#, as that's my native tongue. I'd be keen on dedicating some time to delve into the code.

[1]: https://github.com/microsoftarchive/redis


I'm looking forward to see where they are using this in production.

"After thousands of unit tests and a couple of years working with first-party teams at Microsoft deploying Garnet in production (more on this in future blog posts!), we felt it was time to release it publicly" https://microsoft.github.io/garnet/blog


Judging from comments here I guess no one uses memcached anymore ?


I remember last using it in 2016 or so and it didn't seem particularly easy to configure (we wanted an LRU cache and it had a bunch of options that weren't very well explained, and we couldn't work out why it kept evicting items). We swapped it out for Redis and it was easy to configure and worked how we expected.


:'(


<3

Thanks for your continuing work on memcached! I'd be very curious how garnet's benchmarks compare with memcached.


<3 and Thank You :)


This looks really good. I hope ultimately this replaces the “Azure Cache for Redis” resource. It’s slow, it’s a fork of Redis made to run on Windows, and it takes nearly an hour to create an instance of it.


I don't know why they wouldn't just run Redis on Linux.


If you have a Windows container for your app, you can only orchestrate that together with other Windows containers on one dev box. So if you need to spin up a cache as a sidecar, it has to be Windows.


Thanks for your interest in Azure Cache for Redis! The Enterprise SKUs (E-series) takes less provisioning time. Around 7 min. Hosted in Linux. Would that fit your requirements?


So what's the catch, where this doesn't perform well? Would be neat to see benchmarks on smaller instance types too, 72vcpu is quite chunky boi


> Garnet’s storage layer, called Tsavorite, was forked from our prior open-source project FASTER

Would be interesting to know why it was forked, why the changes can't be incorporated and wether FASTER continues to be developed


I tried to compare the source trees. They are nearly identical except for replacing Faster with Tsavorite which makes a direct comparison much harder as they renamed directories and files.


“Our” may be the key word here. In other words, it may be political; they wanted the freedom to make changes to their original project without having to deal with PRs to the current maintainers.


Orleans will love being bundled with this


Each Orleans node could have a Garnet node, I can see the Aspire configuration now


I am just happy there is a native Redis-like available on Windows. I believe there is another RavenDB replacement but this one is 'more' official!

Is it possible to use Aspire locally or is it just a cloud only 'framework'?


Neat project, but I will be sticking with Redis.

I trust Redis to not do something weird with their licensing or pricing in the future.

Plus Redis has billions of production hours under its belt.

It's easier to install and understand.


Redis changed their license a day after your comment.

See https://news.ycombinator.com/item?id=39772562


Tech has all the attention span of a goldfish https://techcrunch.com/2019/02/21/redis-labs-changes-its-ope...


> I trust Redis to not do something weird with their licensing or pricing in the future.

Aged like milk.


This is a project from Microsoft Research, so I would worry a lot less about licensing and pricing than about lack of updates (either features, maintenance, or security ones)


I see this becoming an Azure service. But you are right, half of the Microsoft Research projects fizzle out within a year - unless used internally.


Which this one is:

> Garnet has been of sufficiently high quality that several first-party and platform teams at Microsoft have deployed versions of Garnet internally for many months now.

https://microsoft.github.io/garnet/docs


The licensing has been through a number of changes in the last few years, depending on which distribution and modules you use.

https://redis.com/legal/licenses/


100% on that. I think Garnet (and similar YARP) are useful if you build your own tweaked version of that. Otherwise, off-the-shelf or PaaS thing are more useful.

Also, this is Microsoft Research. This thing is code sharing not a product.

The real interesting part is what Azure will do with it.


Maybe. Patents are certainly a question, because of the license. Microsoft has definitely been aggressive about patents.


After the aborted/abandoned attempt to port redis to windows (0) this feels like a second-try at the same thing, but first party.

Of course, as a research project it doesn't have the same stability / support, etc. as redis, but I could easily imagine this rolling into a real product if it's popular.

...and as an MIT license, if nothing else, the code is a fun read. :)

[0] - https://github.com/microsoftarchive/redis?tab=readme-ov-file...


MIT license, but with CLA.

It's bizzare. What more rights could they possibly want on top of MIT to warrant CLA?


It's probably a blanket requirement by the MSFT legal team so they can just release everything as "(c) Microsoft" instead of "(c) Microsoft and 999 more contributors, one of which has actually forgot to put their name in the AUTHORS.txt and thus we're technically violating the terms of the license"


Yep, the CLA only applies if you want to merge your changes into the project.


IIRC, it’s to allow them to relicense contributions in the event they decide to move away from MIT?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: