Hacker Newsnew | comments | show | ask | jobs | submitlogin


It frustrates me when people use ASCII instead of packed bitmaps for things like this (packet transmitted once a second from potentially hundreds or thousands of nodes, that each frontend proxy has to parse into a binary form anyway before using it). Maybe it's a really small amount of CPU but it's just one of many things which could easily be more efficient.

Some of this stuff is so simple and useful it's a wonder they weren't there before:

http://httpd.apache.org/docs/2.4/mod/mod_auth_form.html http://httpd.apache.org/docs/2.4/mod/mod_session_dbd.html http://httpd.apache.org/docs/2.4/mod/mod_buffer.html http://httpd.apache.org/docs/2.4/mod/mod_data.html http://httpd.apache.org/docs/2.4/mod/mod_ratelimit.html http://httpd.apache.org/docs/2.4/mod/mod_lua.html

"mod_ssl can now be configured to share SSL Session data between servers through memcached"

"mod_cache is now capable of serving stale cached data when a backend is unavailable (error 5xx)."

"Translation of headers to environment variables is more strict than before to mitigate some possible cross-site-scripting attacks via header injection."

"mod_rewrite Allows to use SQL queries as RewriteMap functions."

"mod_ldap adds LDAPConnectionPoolTTL, LDAPTimeout, and other improvements in the handling of timeouts. This is especially useful for setups where a stateful firewall drops idle connections to the LDAP server."

"rotatelogs May now create a link to the current log file."

Very nice that they rewrote the mod_rewrite and caching guides with more examples and ease of use in mind. Here are the API changes: http://httpd.apache.org/docs/2.4/developer/new_api_2_4.html

It frustrates me when people use ASCII instead of packed bitmaps for things like this (packet transmitted once a second from potentially hundreds or thousands of nodes, that each frontend proxy has to parse into a binary form anyway before using it). Maybe it's a really small amount of CPU but it's just one of many things which could easily be more efficient.

I think that's a premature optimization. If it becomes a performance problem, optimize it then. Otherwise, I doubt it's worth the cost of humans not being able to read the information on the wire, and deciding on and implementing a binary format to represent the information.


That's a fine philosophy for the just-ship-it model of startup webapp development, but this is a piece of server software which is supposed to scale high and far and reduce its resource use as much as possible (part of the emphasis of the release as noted).

If you're writing a piece of some software with one specific function and it doesn't affect anything else and it doesn't take any more time to make it efficient, just do it efficiently the first time. This way you don't have to come back later and rewrite it (which by the time that's required, somebody already wrote something dependent on your crappy design, which now has to be fixed too, etc)


It might be the philosophy of "the just-ship-it model of startup webapp development", but it's not why I have it. It comes from the philosophy of high performance systems research where you can spot inefficiencies all over the system, you just don't have enough time to optimize them all, and any extra complexity you put into the system better be worth it. Every "optimization" you put into your system better have performance numbers associated with it; if you can't experimentally demonstrate that the optimization is worth it, it's not.


Moreover, as someone who's spent a significant chunk of my career basically being a professional optimizer, one of the worst things to fight against is code infected with a plethora of micro-optimizations. Micro-optimizations tend to be based around fragile, case-specific assumptions and when they're used, a smarter, deeper optimization to a more generic version of the problem isn't reused because somebody decided that "every bit counts".

100% agreement on "if it's not showing up in your profiler, don't optimize it".


I agree that if you try to optimize everything before proof that you need it you'll be wasting a lot of time. But the case i'm talking about is a function in mod_heartmonitor.c which calls apr_strtok, strchr, and ap_unescape_url(x2) for three iterations and returns the values in a table. And that function is duplicated at mod_lbmethod_heartbeat.c. For a cluster of 850 servers that's 10,200 function calls every second. When they could have just copied a raw network-order bit string into a struct. Or something.

Yeah it's probably insignificant, but they could have just done it that way the first time. There'd be no optimizing needed thereafter, and no side-effect as it's only used in two places for a limited function. It's actually smaller and easier and faster to write AND to run. You make the right choices at design and implementation time and you come out with better code which doesn't need to be optimized.


Sorry, but you're just wrong on this. Here's an example of parsing a string like the one in there:


It parses that string and reads it into a struct at a rate of about 3 million per second on a single core on my Macbook Pro. That means for your example there of 850 machines in a cluster, we're talking about roughly a quarter of a millisecond of one core of CPU time. Even if that's off by an order of magnitude, it doesn't matter.


The enemy of optimization is hard to maintain code, not small inefficiencies.

Using text in this case makes the wire protocol easier to debug and generally easier to maintain for forward compatibility. Need to add an extra argument? It'll be both self-documenting and backwards compatible, instead of some versioned mess of bitflags.

Could I be wrong? Sure. There's an easy way to prove such: show me the profiler output. Your eagerness for micro-optimizations significantly hints at inexperience.


The enemy of optimization is hard to maintain code

Then why add string serialization to something that doesn't remotely need it?

Using text in this case makes the wire protocol easier to debug and generally easier to maintain for forward compatibility.

If anything the use an unspecified string format makes debugging potentially harder here (UDP truncation...).


Assuming that it'd perform that way in the real world, you are right that my solution would have no significant performance increase over the sscanf you use. But I don't totally agree with some of your supporting arguments.

The wire protocol you're talking about is a heartbeat protocol. This doesn't need to be human-readable because there is no input from a human, nor is it intended for a human to read. Debugging it would be as complicated as a "printf". Oh the horror. We wouldn't want to add debugging to our application - better make people read the raw data with a packet sniffer (which you can't run unless you have root, so good luck getting your application debugged quickly, developers).

Adding an extra argument would require appending an extra field and incrementing the version. Oh noes the mess! Since both solutions are versioned, forward compatibility is just updating code in the server, and if you're going to change things you have to update code anyway, so this doesn't sound like a good reason to oppose it.

Realistically, will more than a handful of shops have a big enough cluster for any of this to matter? No. But even your example of using sscanf is faster than Apache's code (http://pastie.org/3430085) while being 100% compatible with the rest of the code, and still goes to show that doing it right the first time is better than just slapping something together and waiting until you have to optimize.

So do we need to use an unreadable, simpler solution? No. But it would work just as well as anything else and take just as much time to write - if not less.


I would happily co-sign on wheels' post, but I wanted to say something about this: Yeah it's probably insignificant.

If you find yourself saying that, then your work is done. There's nothing to optimize. Optimized code is non-idiomatic, usually clever, often obtuse, and as a result, more buggy. Sure, copying the data off the network is easier, but you still have to figure out an encoding on the sender side, document it, implement it, and probably in the future extend it. There is a very real cost to having around "optimized" code, which is why you had better be sure that the performance gains outweigh the software engineering costs.


You must be confused. I never suggested optimizing the code. I suggested using an initial design which is inherently efficient. They could have also made the heartbeat packets into XML documents. They didn't because they're not totally insane.


You are suggesting an optimization over the default "just use text." Your initial design is more complex and error prone - in my experience, people screw up simple binary formats much more often than simple text formats.


the default "just use text."

Parsing strings is most certainly not the default in C.


It is for interprocess communication for Unix processes, which are often implemented in C.


For applications which deal with users, yes, but this is like telling NFS it needs to transmit permission bits in english. Computers do not need english languages to communicate with one other - only people do.

And since the data being communicated between the heartbeat client and server is computer-generated and automatic, "text" is not even not required, it is an unnecessary step in communication. Apache is implemented in C. C is good at accessing, copying, transmitting, and inspecting raw memory. It is not good at parsing and comparing strings. Adding text to a protocol nobody will ever interface with is adding an unnecessary layer which is not only more complex than my suggestion but is itself more prone to security flaws and bugs unless proper care is taken when parsing the strings.

Really I think this is about people's comfort levels. You feel more comfortable looking at a string and knowing what's in it. Either way will work; it doesn't matter how you skin this cat. But I think it's disingenuous to suggest that comparing raw bits is somehow more likely to blow up than comparing the raw bits after you parsed them from a string.


I think you are underestimating the complexity of serializing data. Designing, implementing, documenting and extending a format for transmitting raw data is, at the least, troublesome. In my experience, people get confused more easily when they have to remember how the endianness of the processor versus the network effects what order data are in - and you have to start thinking about these things once you have decided to transmit raw data. My claim is that this is more effort than just using text.

Further, I think you are underestimating the benefits of people and other programs being able to read the messages without having to read up on, and reimplement such a format.


Once I rewrote a packet sniffer (without libpcap) to support multiple architectures for fun. I never took the time to learn about endianness until I had that project to play with and i'm glad I did. Would I like to program like that all the time and implement every new protocol as a bitstring? Hell no. But sometimes you're handed an edge case which is just perfect.

A protocol which transmits data that dynamically changes according to the use of the application servers, which is implemented by services using a language which makes it easy to communicate raw memory and not need to parse it, passing a tiny low-latency message over an unreliable transport layer without even a concept of whether messages are ordered properly or not.

Forget the common developer. Forget compatibility. Forget usability. Forget reliability. This thing only has one function: tell the frontend proxy i'm alive and how many slots are active or busy. The only thing that matters to the backend server is how fast you can spit out a packet, and to the frontend how many machines are alive. Shit, there might not even be a valid checksum on the udp packet. And you're worried about how much effort it takes to implement this? It probably takes less time to write the code than it does to add the source into your makefiles and make a unit test. The existing function that parses these packets is one small function, which without parsing a string would be one memcpy (or the slightly-less-flexible sscanf as shown by wheels earlier).

There is no crime in writing apps in a way that fits their use. Not all apps are the same, and not every "default" design decision is appropriate. Sometimes you just do what you're familiar with. In this case the Apache guys used an HTTP query string for a heartbeat packet. I would have used something more akin to a network protocol packet. It doesn't really matter as long as it meets the application's requirements and works.

But I still like my way better. :)


Every 1 second, this module generates a single multicast UDP packet, containing the number of busy and idle workers. The packet is a simple ASCII format, similiar to GET query parameters in HTTP. An Example Packet


Consumers should handle new variables besides busy and ready, separated by '&', being added in the future.

I think that last line is why they avoid a binary format. Everyone who's consuming this packet already has a function available which can parse url query parameters with arbitrary name/value pairs, and that's the format being used here. I'm also not sure that the user needs to turn this into a binary format before using it; a string comparison against "75" and "0" is just as capable and nearly as fast as a numeric comparison, especially if you're using a scripting language for your heartbeat monitor rather than C.


Developers that are so obsessed with micro-optimizations that they can't stand to have a plain-text heartbeat packet don't get to have their software deployed on my watch.

I can buy more hardware, but operating and maintaining systems that have been optimized to hell and back for no compelling reason is a massive ongoing human resources cost, not even counting the initial unnecessary development time.


It actually took more time to write it plain-text than it would have to unpack a bitstring into a struct. With or without optimization it cost you more human resources (in terms of time and money) to do it their way. As you add more hardware (in cluster nodes), your CPU time on each proxy node needed to perform this operation every second will increase.

My original point was not to encourage over-optimization, but to design better. But deploy whatever you want, I don't care.


It's harder to debug and extend a bitmap, and in the real world, single service instances don't scale infinitely, more layers are added. You will never have enough back-ends behind a single proxy for the time spent parsing to be measured.

This is a textbook premature optimization. You're designing for academic purity instead of the real world.


Applications are open for YC Summer 2015

Guidelines | FAQ | Support | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact