> [Go] also has managed memory, so it looks safer and easier to use than C/C++.
But most of the post describes a sophisticated way to work around the garbage collector, totally reliant on a specific implementation detail of the current Go GC (skipping of pointer-free data types), documented in a GitHub issue. It seems easier to not have, or to not use, the GC in the first place for this specific project if most of the engineering effort is going to go into an elaborate workaround for it. In particular, I don't understand the reason for rejecting offheap: the article suggests "a cache which relied on those functions would need to be implemented", but surely this is less complex to implement than the cache that relies on hiding what are effectively pointers behind offsets so the GC won't think to scan them.
(This isn't a suggestion to not use Golang at all, to be clear. Nor do I mean to suggest that GCs are bad things in general.)
However, while not having a GC, you have to take care of all memory issues in a way that you convince the compiler that your won't ever blow up.
It would be very interesting to have such a comparison, so we could see whether it's easier to work around the GC, or easier to write bullet-proof code with manual memory management.
I'd expect the Rust to be more robust with regard to performance, but the question is: Would it also pay off in term of code complexity?
But I do think that the code complexity for a trivial hash table with striped locking can't possibly be higher than the complexity of serializing everything into a giant byte array and maintaining offsets into that array into a map which you make sure doesn't have any pointers in it.
Although from all mainstream languages, C# is probably the closest in spirit to Modula-3.
Project is here for those interested: https://github.com/Maascamp/fohlc
How would you compare your cache to open source data grids that also provide off heap (infinispan, geode/gemfire, ignite) or just other general cache solutions (redis/memcache)?
We actually experimented with grid caches (ignite in particular since it offers off-heap in memory storage as well), but the performance simply isn't there. At the volume we're writing even millisecond latency is a non-starter. We did explore both memcached and redis, but we need strict FIFO and both of those solutions provide nondeterministic LRU.
I do agree with you that perhaps Golang was not the right choice for this project since they spent much of the time working around features of the language.
The requirements are literally <100LOC of wrapping an HTTP interface around Redis with using TTL - which has been battle tested for years and is both rock solid and ridiculously fast.
Public service announcement: Don't write your own data store. Repeat after me: Don't write your own data store, except if you want to experimentally find out how to build data stores. It's arbitrarily hard and gets even more so at every layer. Plus, you leave behind an unmaintainable mess for the people after you. There's already a great OSS data store optimized for every use case and storage medium I could possible imagine.
1) An efficient, high-performance distributed persistence store for arbitrarily large CRDT's.
2) A Kafka-like highly-available distributed binary log w/ cheap topics, that doesn't require external coordination, and doesn't lose acknowledged writes (which I'll happily give up any shape of linearization guarantee for).
Unless you have some sort of replication or consensus, this is still just best-effort delivery. For example, the message could be acknowledged & fsynced followed by drive death. At scale, this would happen.
However, with asynchronous best-effort replication, broker death would be much less likely to lead to a loss of messages.
In general, it's all about playing the odds. Acknowledged writes can still be lost in the best of systems if the entire DC catches fire.
If you tune Kafka settings, you essentially get this behavior. You have to ensure ZK is up, but it will allow all topics to be published to so long as a single broker is alive.
The closest I can get is allowing sloppy leader election, but that will definitely, definitely lose writes that have been ACK'd by a stale leader who comes back and has to become a follower. So I can get Kafka to basically accept all writes regardless of level of cluster degradation, but I can't keep it from dropping some of those writes silently onto the floor.
Did you test this or is it just a general view? What sort of write rate are you looking for, per topic?
I care that as long as even one node in the system is alive, it will accept writes, and those writes will eventually propagate and converge on the union set of all writes ever observed, without obliterating any ACK'd observed writes, as the system as a whole moves toward a less degraded state.
I'd want topics themselves to be able to be basically "free" to create, and the write latency to remain stable over arbitrarily large datasets. Write rate isn't really much of a consideration since it's almost embarrassingly low (hundreds per second).
Fair nuff, I asked because I was curious if you were assuming that eventual consistency was necessary for high availability.
> Write rate isn't really much of a consideration since it's almost embarrassingly low (hundreds per second).
Yup, looks like my concern above is the case. A single consensus replica set will do that no problem. I'd use raft if you're in a single geo location or epaxos otherwise (the epaxos source is a little early and needs eyeballs/production diversity testing, but the algorithm is absolutely solid). There's no reason to borrow all the troubles of CRDT's when it's not needed.
Example from the epaxos paper, this is latency vs throughput for 3 replicas in one geo location: http://charap.co/wp-content/uploads/2015/10/latency_over_thr...
You seem extremely concerned about a fault tolerant system experiencing more faults than the majority, which I think is quite easily dealt with, and utterly blithe about making inconsistency your best case.
At this kind of request rate I'd honestly just shove things in postgres and call it good. There's just no reason to build out a bunch of CRDT complexity when you don't _need_ it.
Does that give you a better perspective for my availability concerns?
I also don't have "on demand" infrastructure. I have to deploy to a fixed footprint of bare metal hardware.
I'm "utterly blithe" about inconsistency being my best case for a variety of reasons:
1) I know my use case extremely well.
2) I know that within that use case there are several places where availability is the #1, #2, and #3 priority because the system being unavailable for any time at all has a non-trivial direct financial impact.
3) I know within that use case where there are necessary points of consistency and how to move them out of the critical path that requires extreme availability.
4) I know exactly the problem that CRDTs solve for me, and exactly why I need them in this case, and how to avoid them with simpler immutable/idempotent write patterns where possible.
5) I have to be able to tolerate very high proportion of the infrastructure experiencing temporary and/or permanent failure because the places I have to deploy into are no where near the level of reliability of even a low-mid tier data center in the US. Janky telcos, natural disasters, malicious operators, and flimsy power delivery are the norm.
Edit: I see you edited in a longer comment after my question above. I was just curious, not looking to pick a fight. Suffice it to say we have very different perspectives on how to attack those requirements. I'm bowing out now, cheers!.
1) Anything that is suggested which avoids data loss via consistency models is going to reduce my availability to unacceptable levels.
2) Anything that is suggested which keeps my availability up where it needs to be and doesn't use CRDTs, and/or a sufficiently similar semantic guarantee, is going to silently lose data due to "lost-updates" (ACK'd writes that get obliterated during conflict merges and are now gone forever).
At a minimum I need sibling behavior for conflicting writes to ensure that some half-baked LWW strategy that's barely better than "pick at random" doesn't kill my data (which has very high regulatory and audit requirements around it), and then taking that a step further CRDTs, implemented correctly, provide me a nice interface for interacting with multiple version conflicts and resolving them in coherent ways without forcing me, or my other developers, to have to think up case-by-case resolution strategies and verify their correctness.
There really isn't a tremendous amount of complexity to using CRDTs. That's the point of them. They're easy to use. The complexity I'd really like to avoid if possible by avoiding CRDTs is the cognitive burden my team needs to have around distributed data and all it's horrible nuances to satisfy our use case. The knowing when and why to use them, not so much the how.
"At this kind of request rate I'd honestly just shove things in postgres and call it good." is very, very far away from a workable solution because my problem isn't request rate. The number of ops/sec could be hundreds or millions, and it doesn't really change my actual problem. No matter what the request rate is I could just write to a bunch of postgres instances that have a chash ring overlaid onto them... except now I have to deal with my actual problem still, which is redundancy, and local consistency in any postgres replica doesn't guarantee me much, I can't introduce a strong coordinator like a consensus protocol over the top of them, because it dumps my availability, and so I have to figure out how to safely deal with merge conflicts between the replicas... now I'm stuck figuring out how to build CRDTs on top of postgres anyway. So... what exactly did I win?
For another it would be really weird for Spanner to expose something CRDT shaped since Spanner is a strongly-consistent distributed database which depends heavily on global order and read/write locks driven by meticulously coordinated multi-site global wall-clock time and consensus groups.
In some ways it's almost the opposite of what I want... with exception of the fact that it's also distributed. I want extreme availability with lazy, weak coordination.
No, it's slightly more lazy in that that. It does timestamp coordination, but retrospectively.
I run things on a lean vps, and I had a large amount of data I tried to stuff in a trie. Well it worked fine on my macbook with plenty of memory, but it ran out of memory on my vps. Using bolt helped in this case with the memory mapped file.
> Considering the first point we decided to give up external caches like Redis, Memcached or Couchbase mainly because of additional time needed on the network.
Even mysql+innodb can easily handle 10k read/write queries against a simple pk table with "millions of entries" on desktop-grade hardware.
If the article was "here's a fun experiment to make a cache server in Go", fair enough.
I wish more people had this attitude. The amount of times I have seen people re-implementing something that PhD's have been perfecting since the 70's... it's cringe worthy.
One thing I've noticed is an extreme demand for making internal services available over HTTP. It has it's benefits, but the obvious downsides are the overhead and complexity of HTTP being totally overkill for things like a key-value store of this nature.
Of course, Redis and Memcached support Unix domain sockets which do not use the network and do not suffer from the overhead of TCP. The authors do not address this at all, suggesting they weren't aware of UDS, nor the fact that TCP within one Linux host does not touch the network at all.
Adding to the confusion is that even given an in-memory cache library which overcame their objections to the "network" based ones, they still elected to write their own low-level cache.
So the comment about time needed on the network was either spurious or misinformed. And one thing it was not: measured.
> When the server and client benchmark programs run on the same box, both the TCP/IP loopback and unix domain sockets can be used. Depending on the platform, unix domain sockets can achieve around 50% more throughput than the TCP/IP loopback (on Linux for instance). The default behavior of redis-benchmark is to use the TCP/IP loopback.
Haven't measured myself though.
This has the added benefit of being able to take advantage of other nginx features. For example, you could set different permissions on `set` and `get` routes.
Since you obviously ran some quick benchmarks and concluded that running it locally over a unix socket (confused why you would mention "time needed on the network"... you tested with local sockets, right?) was too slow, you should at least let Antirez know you've run into a new mysterious performance bug ;) Writing a cache service can be a fun side project, but I doubt you gained anything by doing so except another homegrown part to maintain.
I generally prefer memcache unless you have a super locked-down infrastructure (no engineers to deploy a KEYS operation that destroys a shard and all the systems that rely on the data inside until it's finished). Multithreaded + simpler API is great for multitenancy when you have to provide infrastructure to engineers who don't want to learn about infrastructure.
As others pointed out, Go is also a very poor choice if you need to work around every part of the language.
Fair point. Except they are a small consulting company.
Uh... we regularly get performance better than what the OP has described in their "needs" with Redis. It can run in memory - on the same machine and has plenty of HTTP frontends you can work on (not a lot of networking).
No need. Someone already wrote Redis.
The reason why Go doesn't (currently?) have a super-performant HTTP server and a super-performant JSON library is that the designers probably didn't envision someone using the language like this.
"is that the designers probably didn't envision someone using the language like this"
They didn't envisage that the HTTP server would be used to, you know, serve HTTP requests?
In this case, while that is a microbenchmark, it's a relevant one; both things are basically measuring "what's the minimal cost for a web request"? I could still quibble around the question of routing, but generally "within 2x of nginx" is good enough for most uses, and generally, for any non-trivial use of either nginx or Go's web server, you're going to dominate the cost of the HTTP request with your processing. (Considering how nasty the inside of nginx is and how nice the Go HTTP server looks, that's actually surprisingly good performance. And I don't mean that "nasty" as a criticism; it is what it is for good reason.)
(That said, if my back was against the wall performance-wise, I'd seriously consider looking at my incoming requests, seeing if there's a strong enough pattern in what's going on, and writing myself a psuedo-HTTP server that isn't actually an HTTP server, but just looks like one, skipping as much parsing as I can on the way in, and emitting a lot more hard-coded stuff as headers on the way out. I've never had to do this yet, but it's an option I'm keeping in my pocket.)
As for JSON, well, people generally conflate "parsing" and "marshalling" with JSON. JSON parsing is so drop-dead easy that one skilled in the art can write a decent parser in just a day or two; it's a good format that way. However, the task of converting a parsed JSON representation into local data types is actually surprisingly subtle, and any mature language will almost certainly have at least two if not more JSON marshallers that work in fundamentally different ways.
There will generally be at least one built for raw parsing speed, but will always hand you back a very generic data structure that has none of your application-specific types in it. There will be one built for really, really convenient marshalling and unmarshalling of your application-specific types, but it'll probably be significantly slower, and make certain decisions that will mean it can't be used safely on arbitrary JSON... i.e., if there's something that may be a string but may be an object, this library will range from inconvenient to impossible to use. And there are other valid cost/benefit points; the JSON marshaller that loads the JSON into memory and deparses it with nearly-0 additional overhead by re-using the original byte buffer intelligently, the JSON marshaller that can build up specialized parsers with code generation at compile time even if you're in a dynamic language, the JSON parser that for better or worse permits a certain amount of sloppiness to deal with sloppy emitters, etc.
Go's default encoding/json is the one built for convenience of application-specific types that can't handle or emit arbitrary JSON easily. As a sideline it can also do the generic raw parsing if you pass it the correct type to start with, but I believe it's paying some overhead vs. something custom written for that. I'm pretty sure they know they made this choice; all that stuff I described was pretty clear by the time Go was being written, that it is basically impossible to write the JSON library, so you might as well choose which one you're looking to ship. I think for a standard library it was the right choice, because it's a solid middle-of-the-ground choice... most JSON can be marshalled by it, because most JSON is still well-enough behaved for that to work. It's mostly good enough for most uses. But if you need the ultimate speed, or the ultimate flexibility, or the ultimate anything-else, you'll need to pick something else.
Perhaps I'm prejudiced because Varnish has proven to be such an awesome caching mechanism (we still use Redis as a key/value store), but this seems like NIH.
This implementation is far from fast (two magnitudes better performance and it would be credible as "very fast"), and it is non-idiomatic, specifically doing things to avoid the benefits of Go.
As an aside -- HTTP and serialization are both costly. In many, many cases where I've seen them in effect, they were a significant expense for little to no architectural gain.
But can be done very fast (introducing very little latency) and are essential pure operations, so can be parallelized very well (for throughput). Of course, this doesn't account for dev costs, and doesn't make your architectural point invalid.
That sounds like model use case for good old C. I love Go and currently am actively learning it, but why take a memory-safe, GC language and then build your own ad-hoc memory management on top of it to avoid it?
I mean, if they've paid you to write this, I presume you have reasonable hardware to run it on. Bog standard BSD sockets TCP on Linux is down around the 10 microsecond range now. What on earth are you doing with the other 4.99ms?
As I understand it while that does apply to pointer-to-struct and slices and probably strings, that isn't true for naked structs and naked arrays. Those both behave as value types like int.
Edit: Apparently the offheap package does use mmap if you pass a path to their Malloc.