It frustrates me when people use ASCII instead of packed bitmaps for things like this (packet transmitted once a second from potentially hundreds or thousands of nodes, that each frontend proxy has to parse into a binary form anyway before using it). Maybe it's a really small amount of CPU but it's just one of many things which could easily be more efficient.
Some of this stuff is so simple and useful it's a wonder they weren't there before:
"mod_ssl can now be configured to share SSL Session data between servers through memcached"
"mod_cache is now capable of serving stale cached data when a backend is unavailable (error 5xx)."
"Translation of headers to environment variables is more strict than before to mitigate some possible cross-site-scripting attacks via header injection."
"mod_rewrite Allows to use SQL queries as RewriteMap functions."
"mod_ldap adds LDAPConnectionPoolTTL, LDAPTimeout, and other improvements in the handling of timeouts. This is especially useful for setups where a stateful firewall drops idle connections to the LDAP server."
"rotatelogs May now create a link to the current log file."
Very nice that they rewrote the mod_rewrite and caching guides with more examples and ease of use in mind. Here are the API changes: http://httpd.apache.org/docs/2.4/developer/new_api_2_4.html
I think that's a premature optimization. If it becomes a performance problem, optimize it then. Otherwise, I doubt it's worth the cost of humans not being able to read the information on the wire, and deciding on and implementing a binary format to represent the information.
If you're writing a piece of some software with one specific function and it doesn't affect anything else and it doesn't take any more time to make it efficient, just do it efficiently the first time. This way you don't have to come back later and rewrite it (which by the time that's required, somebody already wrote something dependent on your crappy design, which now has to be fixed too, etc)
100% agreement on "if it's not showing up in your profiler, don't optimize it".
Yeah it's probably insignificant, but they could have just done it that way the first time. There'd be no optimizing needed thereafter, and no side-effect as it's only used in two places for a limited function. It's actually smaller and easier and faster to write AND to run. You make the right choices at design and implementation time and you come out with better code which doesn't need to be optimized.
It parses that string and reads it into a struct at a rate of about 3 million per second on a single core on my Macbook Pro. That means for your example there of 850 machines in a cluster, we're talking about roughly a quarter of a millisecond of one core of CPU time. Even if that's off by an order of magnitude, it doesn't matter.
THERE IS ABSOLUTELY NO REASON TO OPTIMIZE THIS CASE
The enemy of optimization is hard to maintain code, not small inefficiencies.
Using text in this case makes the wire protocol easier to debug and generally easier to maintain for forward compatibility. Need to add an extra argument? It'll be both self-documenting and backwards compatible, instead of some versioned mess of bitflags.
Could I be wrong? Sure. There's an easy way to prove such: show me the profiler output. Your eagerness for micro-optimizations significantly hints at inexperience.
The wire protocol you're talking about is a heartbeat protocol. This doesn't need to be human-readable because there is no input from a human, nor is it intended for a human to read. Debugging it would be as complicated as a "printf". Oh the horror. We wouldn't want to add debugging to our application - better make people read the raw data with a packet sniffer (which you can't run unless you have root, so good luck getting your application debugged quickly, developers).
Adding an extra argument would require appending an extra field and incrementing the version. Oh noes the mess! Since both solutions are versioned, forward compatibility is just updating code in the server, and if you're going to change things you have to update code anyway, so this doesn't sound like a good reason to oppose it.
Realistically, will more than a handful of shops have a big enough cluster for any of this to matter? No. But even your example of using sscanf is faster than Apache's code (http://pastie.org/3430085) while being 100% compatible with the rest of the code, and still goes to show that doing it right the first time is better than just slapping something together and waiting until you have to optimize.
So do we need to use an unreadable, simpler solution? No. But it would work just as well as anything else and take just as much time to write - if not less.
Then why add string serialization to something that doesn't remotely need it?
Using text in this case makes the wire protocol easier to debug and generally easier to maintain for forward compatibility.
If anything the use an unspecified string format makes debugging potentially harder here (UDP truncation...).
If you find yourself saying that, then your work is done. There's nothing to optimize. Optimized code is non-idiomatic, usually clever, often obtuse, and as a result, more buggy. Sure, copying the data off the network is easier, but you still have to figure out an encoding on the sender side, document it, implement it, and probably in the future extend it. There is a very real cost to having around "optimized" code, which is why you had better be sure that the performance gains outweigh the software engineering costs.
Parsing strings is most certainly not the default in C.
And since the data being communicated between the heartbeat client and server is computer-generated and automatic, "text" is not even not required, it is an unnecessary step in communication. Apache is implemented in C. C is good at accessing, copying, transmitting, and inspecting raw memory. It is not good at parsing and comparing strings. Adding text to a protocol nobody will ever interface with is adding an unnecessary layer which is not only more complex than my suggestion but is itself more prone to security flaws and bugs unless proper care is taken when parsing the strings.
Really I think this is about people's comfort levels. You feel more comfortable looking at a string and knowing what's in it. Either way will work; it doesn't matter how you skin this cat. But I think it's disingenuous to suggest that comparing raw bits is somehow more likely to blow up than comparing the raw bits after you parsed them from a string.
Further, I think you are underestimating the benefits of people and other programs being able to read the messages without having to read up on, and reimplement such a format.
A protocol which transmits data that dynamically changes according to the use of the application servers, which is implemented by services using a language which makes it easy to communicate raw memory and not need to parse it, passing a tiny low-latency message over an unreliable transport layer without even a concept of whether messages are ordered properly or not.
Forget the common developer. Forget compatibility. Forget usability. Forget reliability. This thing only has one function: tell the frontend proxy i'm alive and how many slots are active or busy. The only thing that matters to the backend server is how fast you can spit out a packet, and to the frontend how many machines are alive. Shit, there might not even be a valid checksum on the udp packet. And you're worried about how much effort it takes to implement this? It probably takes less time to write the code than it does to add the source into your makefiles and make a unit test. The existing function that parses these packets is one small function, which without parsing a string would be one memcpy (or the slightly-less-flexible sscanf as shown by wheels earlier).
There is no crime in writing apps in a way that fits their use. Not all apps are the same, and not every "default" design decision is appropriate. Sometimes you just do what you're familiar with. In this case the Apache guys used an HTTP query string for a heartbeat packet. I would have used something more akin to a network protocol packet. It doesn't really matter as long as it meets the application's requirements and works.
But I still like my way better. :)
Consumers should handle new variables besides busy and ready, separated by '&', being added in the future.
I think that last line is why they avoid a binary format. Everyone who's consuming this packet already has a function available which can parse url query parameters with arbitrary name/value pairs, and that's the format being used here. I'm also not sure that the user needs to turn this into a binary format before using it; a string comparison against "75" and "0" is just as capable and nearly as fast as a numeric comparison, especially if you're using a scripting language for your heartbeat monitor rather than C.
I can buy more hardware, but operating and maintaining systems that have been optimized to hell and back for no compelling reason is a massive ongoing human resources cost, not even counting the initial unnecessary development time.
My original point was not to encourage over-optimization, but to design better. But deploy whatever you want, I don't care.
This is a textbook premature optimization. You're designing for academic purity instead of the real world.