Well, if AMD had taken the need to get their GPGPU offering out the door seriously or if they had later taken the need for CUDA compatibility seriously then Nvidia would have had to compete.
Alas, we're talking about a company that doesn't even think it's important for their installer to reliably replace existing drivers. The result: I get to pay through the nose for my predecessor's CUDA lessons. Yay.
I love awk. I once had to search a multi-megabyte hunk of data that was made up of 25-bit data items packed into 32-bit words. Instead of doing bit packing and unpacking, I converted the words into 32 character strings of 1's and 0's. I ended up with a string 300,000,000 (that's three hundred million) characters long!!! Awk had no problems handling it.
To build the string, I had to concatenate 1024 of the 32 characters strings to an intermediate string, and then concatenate these into the final monster, because concatenate just the 32 character strings took too long - a reallocation after every concatenation.
I believe this is an example of the sort of thing that the essay author complained about.
Bit-packing is simple. You spent a lot of time working around problems that shouldn't have existed in the first place. Even when using the approach you described, here is Python code which does what you described:
>>> byte_to_bits = dict((chr(i), bin(i)[2:].zfill(8)) for i in range(256))
>>> as_bits = "".join(byte_to_bits[c] for c in open("benzotriazole.sdf").read())
>>> chr(int(as_bits[:8], 2))
>>> chr(int(as_bits[8:16], 2))
This keeps everything in memory, since 300MB is not a lot of memory. If it was in the GB range then I would have written to a file instead of building an in-memory string.
The run-time was small enough that I didn't notice it.
The thing is, you succeeded in solving the problem, and are justly proud of your success. This is how a lot of scientists feel. But a lot of CS people look at the wasted work when there are simpler, better, more maintainable ways.