It's mostly "done" actually. It's already looking really promising, especially considering that it can do things that other servers keep proprietary, if they do it at all (for example, NTLM proxying, or coordinated automation of TLS certs in a cluster).
If you want to get involved, now's a great time while we're still in beta! It's a fun project that the community is really coming together to help build.
Any thought or plans for some kind of back-pressure? Health checks and response times are useful, to a degree, but there are a number of workloads where they don't actually capture the cost of the work involved, and they can also really trip you up something nasty under certain failure conditions :D
edit: by way of example, I used to work for a service that customers would upload files to, that's all that the traffic was. There was wild variability in the size, processing cost, and upload speed of each request. None of the standard load-balancing approaches really balance "load" from a service perspective. While things worked, it was rarely optimal.
I think most proxies are planning to move logic like this to the control plane. Envoy's gRPC stuff has some ways to dynamically throttle traffic to backends.
Load balancers really need to become programming runtimes, imo. Config languages aren't very expressive, and almost everyone needs their own logic at the LB level.
I _just_ put together a demo of latency based load balancing using HAProxy + awk and it's neat, but still very rudimentary compared to what I could express in, say, JavaScript: https://github.com/superfly/multi-cloud-haproxy
Caddy 2 has an embedded scripting language that allows this. We have to flush it out some more but it's looking really good.
In some of our early testing on basic workloads, we found that it's up to 2x faster than NGINX+Lua, largely because it does not require a VM. (This is a broad generalization, and we need to specifically optimize for these cases -- but this approach holds promise.)
F5 BIG-IP can be programmed in Tcl. While it is a programming language, I have only seen it programmed by non-developers. With copy paste code, repeated string constants all over the place and no unit tests.
I agree that load balancer a do need the expressiveness of programming languages, but ideally that would only be with some typing and ability to easily un it test.
Yes! And a good surface API, OpenResty (nginx + lua) is reasonably powerful but you're really limited based on the events they give you.
I'm really hopeful about deno (https://github.com/denoland/deno) for this. TypeScript is nice, the deno TCP perf is good, all it needs is some good proxy libraries.
Yes, aside from multiple load balancing policies, Caddy 2 has a circuit breaker that will automatically adjust the load balancing before latency to a particular backend begins to grow out of tolerances.
Both the load balancing policies and the circuit breakers are extensible, meaning it is easy to change their behavior and add new ones as needed.
I could also imagine a specific load balancing policy that adds up a cost for each request using headers such as transfer size; i.e. dynamic weights. This would be a great contribution to the project if you are interested!
Caddy 2 also has an embedded scripting language that can make this kind of logic scriptable and dynamic, but that's still a WIP.
Seems like most of the work is done by the `ReverseProxy` package and this code is more about health checking.
Nice to see how simple it is now though. Go is definitely a great choice for low-level networking, and .NET Core has recently become a great option as well.
I’m also a bit confused where is .NET Core coming to the picture. It’s becoming faster and faster but it’s definitely not the best tool for load balancers.
Why not? The latest techempower results show ASP.NET Core saturating a 10GbE network card with 7M+ req/sec while being a full-featured framework, compared to custom C++ web servers. [1]
The memory management, type safety, and high-level productivity make it great for building load balancers and other infrastructure components.
You'd have to know how much memory and cpu was being used vs the other solutions. They're all very close in terms of the key metric, but if Rust is using 50% of the memory, then its not much of a competition. Also, what sort of vm tuning, etc.
True but you can dive into the benchmarks more if you visit the techempower site. They run it on the same standardized hardware and in this .NET is very competitive on cpu and ram.
It's nice to see a walkthrough of what goes into a load balancer and how simple it is to build on in Go.
One nitpick is that the autho reversed the meaning of active and passive health checks. Active generates new traffic to the backends just to determine their healthiness, passive judges this based on the responses to normal traffic.
Still seems like a problem to me. It breaks encapsulation.
The mutex is only used inside of SetAlive() and isAlive(), they're the only things this need to handle locking and unlocking. You don't want anything external to that calling the methods on RWMutex.
Oh of course, if you're not using it then don't expose it.
I haven't read the code so I can't verify if that's the case here, but I read OPs post as being worried about method clobbering (which is really a non-issue, if the popularity of embedding mutexes shows us anything).
With talk of proxy and go. I am surprised gobetween hasn’t been mentioned yet https://github.com/yyyar/gobetween . Last time I looked at the source it was very approachable.
There are probably thousands on Github for the use of teeth-cutting like in the blogpost. I've made one. You just lose interest when you impl the easy/naive stuff and need to actually use a real solution in production.
Apache's httpd is perfectly good as a web server, and passable as a proxy. I can't say that it'd be my first choice, but I'm not aware that there's any special reason not to use it.
The Byzantine config file syntax was a common complaint shared by the early defectors to nginx. If I never have to edit that conf file again it will be too soon.
Several of those are ingress routers, and I’ve seen no claims of those being able to run standalone (ie without specifically k8tes) so they are technically load balancer, except they can’t operate standalone. Say I wanted to run one per server instead of NodeJS cluster mode. I could not substitute those, right?
One is F5 which nobody says anything nice about anymore, and was positioned as a hardware solution for most of its good years.
I’d also note that a number of these are pretty young, indicating that there was a power vacuum that is now filling in.
I'm thinking of a "load balancer" that charges for basic features like querying for which downstream servers are in service..
HAProxy on the other hand has been doing some really fantastic stuff lately that is in the open source. It makes me a bit sad that the former is the "go to" and HAProxy doesn't get the usage it deserves.
To your specific point though I think it's just super tricky to get all the features people expect at the performance they expect as well. You essentially need to implement an very efficient HTTP server, also apparently not trivial, in order to get expected Layer 7 features.. Embedded scripting language support is quickly becoming de facto.. Simple traffic proxy/forwarding is the easy part but getting something competitive together feature and performance wise feels way beyond teeth cutting?
EDIT: Using golang as an example. You would need to either piggy-back the best HTTP server available for your language platform or write your own. The stdlib Golang http server is not competitive performance wise. The reasons for this are pretty well known and seem to be largely accepted by the core team(AFAICT). This won't impact most app and service developers too much, but for an LB service the number of cycles it leaves on the table my not be acceptable at all.
I’m dealing with some people that have some sort of existential dread of haproxy and I can’t get them to admit why they won’t use it. I’d send them to a therapist but I’m not their boss.
Yup. It's also famously incompatible with the wider ecosystem because it's not a drop in replacement for net/http. There are other criticisms I haven't really looked into as well.
Not meant to be my own criticism, just an observation. I believe the net/http interface ossified its own design decisions making it hard/impossible to be compatible with?
This is pretty cool. But I think an implementation that avoids the mutexes (mutices?) when allocating the backends and uses channels instead would probably perform better.
2 channels needed, 1 for available backends and 1 for broken ones.
On incoming request, the front end selects an available backend from channel 1. On completion, the backend itself puts itself either back onto channel 1 on success, or channel 2 on error.
Channel 2 is periodically drained to test the previously failed backends to see if they're ready to go back onto channel 1.
I haven't really read it but I'll take a stab (with psuedo code). I think NextIndex() is incrementing s.current and modding it with servers.length.
To do this, there are three operations, SET s.current to +1, and GET s.current so that it can be MODDED with servers.length.
If that SET and GET are not coordinated between threads, then a race could sour the index. For example, if two threads call this method at the same time, they could both SET the +1 before either GETs the value, then they will both get the same value +2 from where it started instead of +1 for each caller.
Thanks for the explanation. Forgive my lack of knowledge, but can it not be solved by using parallelism instead of concurrency/threads? I thought Go has first class primitives to be able to do this?
> I thought Go has first class primitives to be able to do this?
TBH I'm only 1 week into learning Go myself (coming from Java/Scala/JS/etc). But it looks like the article used what Go offers. They used the atomic package which says, "provides low-level atomic memory primitives useful for implementing synchronization algorithms."
Channels aren't a way to avoid mutexes/locking, they are just meant to be simpler to write & reason about. Channels are implemented using mutexes under the hood.
When you have 2 threads running in parallel and the outcome of the computation depends on the order in which they do their job first, that's a race condition.
> After playing with professional Load Balancers like NGINX I tried creating a simple Load Balancer for fun.
And while nginx[0] certainly can perform in this role, another production quality load balancer is HAProxy[1]. Both can do more than this, of course.
Reinventing solutions "for fun" certainly can be educational and help others learn key concepts, but the author should clearly state what they are doing is not meant to replace production quality solutions.
Besides, even if a beginner somehow wanders into the project, mis-IDs the project, and uses on their beginner server, then it sounds like a good learning experience for them even though the README didn't give them permission to learn by neglecting to include the word "educational", god forbid.
Is this the catastrophic scenario you have in mind?
I was trying to be explicit as to my reasoning. If that came across as "hair-splitting" then I suppose I failed to adequately do so.
The whole point I was trying to make is that people find code in all sorts of ways. And my opinion is that if a public repo, such as GitHub, has a project which could easily be both desired (due to need) and misused (due to intent), then it might be a good idea to put a simple declaration in the project's README.
Given all the blow-back this concept has incurred, I would think the concept is either wholly immaterial or now proven as needed.
This seems to be a strange hill to die on. For all intents and purposes there is a declaration in the readme. And anyone who knows enough to want to operate with this will see that it's fairly basic. Others will likely just reach for a more generic or battle tested solution.
In any case, people should be able to do what they want with their repos and code, assuming legality of course.
> If you race through chains of synonyms, changing between definitions along the way, you can get almost anywhere. Why does this matter?
I am not racing through chains of synonyms.
What I was trying to elucidate was that the use of "dumbest", when the repo is ".../simplelb", and the very first line of the README says the project "is the simplest Load Balancer ever created" might, just might, lead people to think that "dumbest" is being used in this context as a synonym for "simplest".
Which might, just might, cause people to consider it for uses this project is not intended to satisfy.
I would really love to meet the person that knows the English language well enough to understand that dumb can sometimes mean simple but also would interpret that sentence to mean “world simplest Load balancer”.
Very strange axe you have to grind for a very unlikely hypothetical. Just because something is possible doesn’t mean it is probable.
> > but the author should clearly state what they are doing is not meant to replace production quality solutions.
> If you know you need a load balancer you should already know this.
If you know you need a load balancer, you go and look for one. If you happen upon one in a programming language you use, then it may be more appealing.
But if you don't do any research to find out if it is production ready, you aren't a production ready dev anyway. A disclaimer isn't going to save them.
Yes, if you will use anything that doesn’t say “NEVER USE THIS IN PRODUCTION. SERIOUSLY THIS IS FOR EDUCATIONAL PURPOSES ONLY” then you are going to get burned by a lot more than a LB.
If you like this kind of thing, we are developing a very powerful and flexible reverse proxy with load balancing into Caddy 2: https://github.com/caddyserver/caddy/wiki/v2:-Documentation#...
It's mostly "done" actually. It's already looking really promising, especially considering that it can do things that other servers keep proprietary, if they do it at all (for example, NTLM proxying, or coordinated automation of TLS certs in a cluster).
If you want to get involved, now's a great time while we're still in beta! It's a fun project that the community is really coming together to help build.