
The New HAProxy Data Plane API: Two Examples of Programmatic Configuration - phil21
https://www.haproxy.com/blog/new-haproxy-data-plane-api/
======
hoov
This is so great! While at Adobe, to solve some technical issues, I wrote a
microservice that sat along HAProxy, providing a REST interface to manipulate
the configuration, validate, and then do a graceful reload. The fact that this
is now built-in is amazing, and a lot less clunky. I wish I had this in 2012;
it would have saved me and my team a lot of work. And, I'm sure this solution
is much better than the band-aid we built.

------
nickramirez
It was fun writing this blog post because the API covers a lot of ground in
terms of what can be done programmatically with the configuration. I've wanted
something like this for integrating into a CD pipeline, for example.

------
cheriot
Are there any fundamental incompatibilities between the HAProxy data plane api
and Envoy's?

I wonder if something could sit in between and translate. It would be great
for all the existing control planes to work with multiple data plane proxies.

~~~
bndw
Envoy's xDS APIs work with snapshots of configuration data that are compiled
externally and sent to Envoy.

HAProxy's API looks like more of a transactional RPC API and HAProxy manages
the configuration internally.

~~~
aiharos
Transactions are an optional part of the API to allow change over multiple API
resources atomically.

Getting an entire snapshot of configuration data is quite compatible with
this, since then the explicit use of transactions is not necessary and the
dataplane can take care of it on its own.

------
hartator
Stupid question. What was wrong with hot reloading the configuration file? It
doesn’t drop requests, doesn’t seem to impact CPU and RAM by much, and it’s
straightforward to script around.

~~~
zaroth
Can you continue to do this if you want to?

But think about a system which is dynamically adding domains with TLS and SNI,
or constantly adjusting backend workers. Or trying to automate a clean rolling
upgrade.

You want to quickly shed load, or add new TLS certificates, etc. If you have a
process which is accepting these change config requests and controlling a
file, it potentially has to worry about atomicity if several different things
are happening at the same time — could one change blow away another if you
script a “read config, update config, link new config, trigger reload, poll
till complete” function without locks, one change could blow away another?

Presumably the API would protect against this and allow a simpler function to
push a given state change?

But I agree that it’s likely over designed and not strictly necessary assuming
the hot reload doesn’t have unanticipated side effects. E.g. What happens to
stick tables, session state, etc.?

It’s not like the REST API eliminates all these concerns too. For example, if
you make a change via the API, presumably it is persisted across reloads, or
there is at least a way to make it be?

~~~
user5994461
The configuration file allows to reconfigure at once in an atomic manner. The
configuration is also verified before reconfiguring and can be aborted if it's
not runnable.

HTTP offers none of this. No atomicity and no consistency of the full
configuration. Want to edit a service or maybe alter some hosts? Better hope
it's already setup as expected before editing and you're doing all the right
calls in the right order and they all succeed. You will have to write the hell
of a state machine to ensure to get to the expected state.

The only sane use case for live-changes is enabling/disabling a server, which
is necessary to perform rolling maintenance or blue-green deployment.

~~~
nickramirez
The HAProxy Data Plane API performs a validation check: it sends the -c flag
to the HAProxy process before reloading to make sure that it is a valid
configuration. If the configuration is invalid, the changes will not take
effect and you will see errors in your console or log. With transactions, it's
not quite stateless (as HTTP is). Transactions provide a way to make multiple
changes and apply them with a single commit. The main benefit of an API is
towards program-ability (is that a word?), in which configuring HAProxy can be
controlled by "control plane" software, slick looking UIs, tools like Ansible,
etc. in an automated way. You can also generate client-side code from the API,
such as for Go/Python/[insert language] coding. It goes beyond the use case of
"I'm a human who wants to control HAProxy manually or with templates".

------
sporkland
My company uses haproxy managed by synapse extensively. As a developer I'm
super impressed at how fast Willy Tarreau and co are moving and changing
haproxy to keep up with things like envoy and friends.

One issue we have with our internal instances is that while the inbound
sockets are passed across reloads, the connections/sessions state have to be
initialized on every reload. We also use mTLS, terminated between services by
haproxy itself on both sides.

So we have some services using haproxy as a client side load balancer where
there are hundreds of backends for the service and that list changes somewhat
frequently triggering reloads. This leads to heavy CPU usage on all the
backends who have to deal with clients reloading and re-negotiating mTLS
sessions.

I haven't been able to tell from the various marketing messages, but does the
new data plane API avoid restarting the process, and hence this re-
negotiation?

------
sandGorgon
Is this available in the open source version or some commercially supported
version ?

Because this stuff is only available in the nginx paid version.

This pretty much makes making k8s ingresses very easy.

~~~
nickramirez
The HAProxy Kubernetes Ingress Controller
([https://github.com/haproxytech/kubernetes-
ingress](https://github.com/haproxytech/kubernetes-ingress)) uses the same Go
library that the API is layered on top of: the "client-native" library.
[https://github.com/haproxytech/client-
native](https://github.com/haproxytech/client-native)

~~~
sandGorgon
This is so cool. Time to rewrite my 5 year old nginx configs to haproxy.

P.S. in typical haproxy vein, its a bit hard to get started. for example the
Docker page doesnt have a trivially runnable version for haproxy 2.0
([https://hub.docker.com/_/haproxy/](https://hub.docker.com/_/haproxy/)).

~~~
nickramirez
Try the haproxytech images, which are updated for 2.0:
[https://hub.docker.com/search?q=haproxytech&type=image](https://hub.docker.com/search?q=haproxytech&type=image).
There is also information on using the ingress controller here:
[https://www.haproxy.com/documentation/hapee/1-9r1/traffic-
ma...](https://www.haproxy.com/documentation/hapee/1-9r1/traffic-
management/kubernetes-ingress-controller/)

------
nwmcsween
Ugh why, why does every unix tool slowly grow until it includes an HTTP
server, GRPC, etc. UCSPI would ideal here.

~~~
jrockway
I do not understand your complaint. It's a layer 7 HTTP load balancer, so it
has to understand HTTP.

As for why people use APIs for configuring things, it's because infrastructure
changes frequently enough that encoding it in a static file isn't practical.
Imagine you are rolling out a new software release. There are 100 replicas.
You could start up a new replica, see if it passes health checks, edit the
haproxy config to configure that as an endpoint, wait for haproxy to start
sending it traffic, check that it's working, edit the config file again to
remove the old version's replica, wait for haproxy to acknowledge that change,
shut down the old replica... and finally do that again 99 more times. Nobody
wants to do that, so there's an API to inform your frontend load balancer of
where your workers are. When it needs some endpoints, you tell it.

I guess an even better example is TLS certificates. They expire every 3
months. What do you do when a new one is released? Drain traffic from one load
balancer, copy the certificate and key over, wait for the load balancer to
restart, and then undrain traffic? No, that's a huge pain. You just have your
load balancer connect to your secret discovery service and have the same
program the renews the certificate send your load balancers the new
certificate. Then you don't have to drain; when a new connection comes in, you
present the new certificate.

All these config APIs make it possible to operate software at scale. You
probably don't need it for your personal website. But you could probably
eschew servers entirely and just nc -l on port 80 and type the response in
your terminal when a request comes in and still have a mostly-working website.

~~~
nwmcsween
The issue isnt that it has to understand HTTP, the issue is HTTP is now a
baked in configuration method that must be supported forever. Using UCSPI one
could just spit out text, template with awk, chain to tcp server and to
whatever your heart desires.

~~~
zaarn
On the other hand, you can trivially handle the API with jq and curl, both
very simple unix tools that do what awk can do just better since the data is
structured.

