Hacker News new | comments | ask | show | jobs | submit login
How we fine-tuned HAProxy to achieve 2M concurrent SSL connections (2017) (freecodecamp.org)
82 points by piyushgupta27 40 days ago | hide | past | web | favorite | 13 comments

I have a stupid question, there are at most 65536 ports and some are not usable. There is also a file descriptor limitation. How to have that many concurrent connections?


There are at most 65536 ports per remote IP.

TCP connections are basically identified with an (ip, port) tuple.

Also you can set the file descriptor limits to whatever you want.

To further clarify, even if you only had 1 host behind a proxy, you can create multiple listeners for that application on different ports or IP's. I do this for a few of my servers to avoid fin-wait / time-wait assassination attacks.

i.e. have apache listen on 127.0.0.* and set up ifcfg-lo-range with

Or have the server behind haproxy listen on a few hundred ports.

Aren't TCP connections more accurately identified by (srcip, srcport, dstip, dstport) tuple? This is somewhat significant as you can easily have tons of IPs in a single box.

Yes, but after the initial syn/ack, the daemon will allocate an outgoing port number for the connection. So if you have a single IP address and a burst with hundred of thousands of requests, you will run into problems..

Yes, but local port and IP are usually fixed when running a HTTP server (unless you're binding to multiple IPs on one machine).


> TCP connections are basically identified with an (ip, port) tuple.

I believe the full identifier is a quad: (source ip, source port, dest ip, dest port).

Also note that DNS will load balance the IP presented. For instance 'dig'ing yahoo.com gives me the following:

  yahoo.com.		92	IN	A
  yahoo.com.		92	IN	A
  yahoo.com.		92	IN	A
  yahoo.com.		92	IN	A
  yahoo.com.		92	IN	A
  yahoo.com.		92	IN	A
So you can now multiple the quad by the number of public IPs you load up into your DNS record.

> Also you can set the file descriptor limits to whatever you want.

Isn't limited by the amount of memory?

One quibble I would make with the article is that you are not actually limited to ~65k connections from a client. It is only ~65k per IP address on the client (given that they're all talking to a single remote port)

You can add NICs or virtual IPs and bind your client instances to specific IP addresses instead of INADDR_ANY.

It is addressed in the previous article of the serie.

hot damn!

I really appreciate the walk through of the apache bench (ab) results and learning process even though it didn't get them to their objective - I've been thinking about using ab myself, and these are great things to know.

I took down a few notes while reading the article:

- mentions use of apache bench ( ab ) command for load testing

- mentions use of ganglia tool

- mentions configuring HAProxy for multi-core using nbproc setting

- mentions the 'parallel' tool for running commands in parallel

- simulate long run requests by having the server delay a little vs client (work around for ab deficiencies )

- have the server also send different response lengths back simulating varying load

- pdsh tool to remote parallel shell (ssh) sessions

- vegeta tool - that got them to their scalability / tipping point objective

- nodejs (used for their backends) had a default request timeout of 2 mins

- used dmesg to learn that haproxy was running out of mem (at around 1.2mm conns)

- pdsh to run vegeta tool on multiple machines (acting as clients) - script included in article

- mentions haproxy maxconn setting - and verification by checking proc fs limits for the haproxy pid

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact