Hacker News new | past | comments | ask | show | jobs | submit login
Understanding the power of bitwise operators (deusinmachina.net)
138 points by Decabytes on May 11, 2023 | hide | past | favorite | 44 comments



I like introducing people to new concepts, and I will assume this entry is for non-technologists. This is an excellent start.

If I may suggest, stop insulting your readers (e.g., several places implying reader does or does not do something 'normally', or not know something).

In cases where an error is well known, describe the error so it is understood you are using it as an example, and not an actual error (e.g., Null character (�) a black diamond with white question mark in the middle).

Single bit is the most often used type of data (vs "single bit is almost completely useless"). Your lights are on, or off; you are dead or alive; it is today or not, vast majority of network flags in various protocols are single bits.

There are a lot of weasel words throughout the post. If you plan to educate, uncertainty reduces comprehension and retention. There are some absolute statements are made too, but might rightfully questioned (e.g., [b]inary is the only language computers understand). Make sure your absolutes are truly absolutes.

Keep up the good work.


> (e.g., Null character (�) a black diamond with white question mark in the middle).

The description in the linked article is wrong, that's not a null character (U+0000) or ASCII's NUL. That black diamond symbol is U+FFFD the Unicode Replacement Character, it means "Something went wrong, so instead here is this symbol". For example if your decoder algorithm gets some gibberish and you can't or won't accept errors in decoding, a correctly designed decoder should produce U+FFFD each time, this is what Rust's String::from_utf8_lossy and String::from_utf16_lossy are doing.

U+FFFD was carefully chosen because it's not anything. It's not a letter, or a digit, it's not some ASCII control character, it's not a separator in any known writing system or protocol, it has no defined pre-existing purpose, which means if bad guys can trick your bad software into injecting U+FFFD into some data, chances are they achieved nothing of value.


Tangential aside we've collectively made a mistake using A-F for HEX representation. Alphabetical order seemed obvious at the time, but there's a far more literal option that's just pleasing on a visceral level. LHTIFE. The horizontal lines of each letter are literally encoding binary information. True there's no letter encoding 3 in this block style but that can either be invented or ignored. Or you can fudge the pattern slightly with a diagonal such as z or a curved lower case such as b. It would have mapped so cleanly to 7 seg type displays. Even the name "HEX" is 2/3rds of the way to being self descriptive. A perfect little numeral grouping of chars.


That would be a trivia to have to know, otherwise it just looks like a mess of arbitrarily chosen letters. The binary patterns shown on the chosen letters do not match the actual binary patterns those letters represent. And then you don't try to do the same with 0 to 9 which makes the whole effort half-baked.


Just going to point out real quick that the lower left vertical on a 7 segment display does in fact come extremely close to directly encoding even/odd. Write them all out and see for yourself. Only the number 4 breaks the pattern. That would only leave two bits not directly encoded in segments as far as anyone has noticed.


The I is a bad choice since it's similar to 1, using the letter Z instead would be much better. The letter T can't be mapped to a 7-segment display either, so if that's your goal you would need to use EFGHLP.


I'm reluctant to grab "encoding letters" out of the curvy set, simply because there's nearly enough of them to make a complete binary-lettering alternative on their own. uhbDPB. That one is also frustratingly one digit missing from completion. If they were complete sets, you could treat curvy/rigid as a bit and thus have an easy and obvious system for translating between half-bytes and English orthography.

I'm not getting sucked back into this. There comes a point when you're just staring at the alphabet letters and thinking "TF am I DOING?"


Also, yeah obviously T can't be mapped, but I still put T in the same category as all the other square intersecting grid letters. There are other, obviously repeating groupings that would work better on a different segmented basis. Some examples: diagonals {Z N X Y A V K 7 4 W M}, circle based {O G C Q D}, loopy: {B P b d q J S 8 9 3 }, 7 seg: {L H T I F E}


I've seen UVWXYZ. Less seriously, I've seen GBNJFL (Great Big Numbers Just For Laughs).


Bitwise operators are the beginners introduction to SIMD programming.

A 64 bit register doing XOR is really 64x parallel XOR happening in parallel each of size 1 bit.

A shift, rotate are just fancy movement operators and exist in the SIMD space.

Good bitwise programming IMO requires a full embrace of this parallel mindset. See the Chess Programming Wiki for all sorts of things that 64x parallel bits can represent.

In particular, a 8x8 chessboard maps well to 64 bits.

-------

No really. Do remember that one of the OG SIMD computers, the CM2 (connection machine 2) was a 1-bit core but SIMD across thousands of cores (bits). And a lot of research and understanding of modern parallel concepts comes from the Connection Machine.


For folks that are interested in learning more about bitwise operations, here is another article (slightly more nuanced) about XORs that you may enjoy reading:

All About XOR: https://accu.org/journals/overload/20/109/lewin_1915/


This is excellent, thank you.


Really weird to start with 'that one section that most of us skip' and then use an _instruction set emulator_ as the real world example. Who exactly is both frightened of the & symbol and wants to learn about instruction encoding?


I was hoping for something really unusual after the introduction

"While at first, they might seem obscure, unhelpful, or tools for people who write in low level programming languages, they do serve a purpose. "

But then... Instructions set, assembly language...


If you like this, I recommend Hacker's Delight. The first chapter goes into bit twiddling in depth. It ends with some explanation of what all can be expressed in bit-wise operations, a sort of Fundamental Theorem of Bit Twiddling.


I was just doing some bit shifting yesterday. I wanted to pass a drawing function a Uint16 that would represent a 4x4 black and white pixel pattern. It was a little hack to create early MacOS-like pixel patterns in SDL. SDL has primitives for line and rectangle drawing with solid colors but little else.

So I wrote some straight C, some nested for loops for row/column of the area to be patterned. I used row % 4, column % 4 to map to the appropriate bit within the Uint16 "pattern". Some bit shifting, masking and I knew whether to fill the pixel with foreground or background colors.

I always enjoy bitwise operations. (There's I guess the classic programmer problem of swapping the values of two numbers in two registers in place.)


Try the generalized problem of in-place arbitrary 2x2 matrix transforms.


> While at first, they might seem obscure, unhelpful, or tools for people who write in low level programming languages, they do serve a purpose.

Firstly, no one says that bitwise operators don't serve a purpose. Secondly, this is an interesting way to start an article that then goes on to explain how useful bitwise operators are for low level programming.


> right shifting adds zeros to the most significant side. So, using the same number we get…

1110 >> 4 -> 0000 1110


Holy crap, hopefully a typo


One of my earliest exposures to the power of bitwise operators happened when I was learning to write chess engines. This was right around the time 64 bit processors were hitting the markets, which allowed storing one bit of information per square, making it ideal for efficient processing via bitwise operators. There were some clever operations that allowed finding sliding piece collisions in only a few assembly instructions (with prodigious use of BSR and BSF) and a small amount of precomputed memory.

For anyone interested in binary chess math:

https://www.chessprogramming.org/Bitboards#General_Bitboard_...


There's a fairly famous bit twiddling hacks site hosted at Stanford but I find this interpretation is more accessible, and has a better breakdown, and more accessible code examples. Here's the page that's all about using the bitwise operations to query the status of the kth bit in a binary sequence:

https://www.techiedelight.com/bit-hacks-part-2-playing-kth-b...

Incidentally, deciphering some unknown bitwise expression is a task that ChatGPT excels at. For example, try asking it to do a stepwise breakdown and summary of:

unsigned int a, b, mask, r;

r = a ^ ((a ^ b) & mask);


i would recommend using "raised to the power of" (or similar) in place of "^" earlier on in the article, as it may get confused with the xor later on.

and use fixed fonts for all the numbers.


Is it an autocorrect issue or the writer only-heard-never-read that the "^" character is called a “carrot”?


The "^" character is correctly named the "caret".


Common: hat; control; uparrow; caret; official name: circumflex. Rare: xor sign, chevron; to the (‘to the power of’); fang; pointer (in Pascal).

(http://catb.org/jargon/html/A/ASCII.html)


As someone who is used to the French pronunciation of “caret”, “carrot” is hilarious.

Also, in INTERCAL the character is called “shark” or “sharkfin”.


Exactly! Hence the guesses: is it autocorrect or that the article author has heard the word but never seen it written that made them write it as “carrot”?


Was hoping for some magic at the end! Bit masking is one thing, but Bloom filters [0] are one of the coolest things to use when explaining why XOR is all over the place in computer science. It's behaviour is just unexpected and surprising enough to be a fun puzzle, but not too complex to make explaining it need more than 3 min and a white board.

[0] https://llimllib.github.io/bloomfilter-tutorial/


This is a baffling comment, since Bloom filters don't use XOR. Indeed, the link you provide doesn't mention it. I honestly don't know what role XOR would play in an explanation of Bloom filters.

The real reason for the ubiquity of XOR in computer science is that it corresponds to addition of two bits in the field GF(2). If that sounds too complicated for your purposes, sorry.

https://en.wikipedia.org/wiki/GF(2)


> If that sounds too complicated for your purposes, sorry.

Ehhh, GF(2) and even GF(5) are actually pretty easy to describe.

GF(2) is arithmetic modulo 2.

0+0 == 0

0+1 == 1

1+0 == 1

1+1 == 10, except we can only carry one bit in GF(2). Like an odometer, we can only keep the bottom bit, so the "1" drops off. 1+1 == 0.

Hey look, its XOR. The end.

----------

A "Field" in Abstract mathematics is simply a number-system that has add, additive-inverse (aka: subtraction), multiply, and multiply-inverse (aka: division). Note that "subtraction" and "division" are just simplifications of the concept, to be a true inverse, all inputs and outputs (domains-and-ranges) must be in the field.

So I proved that XOR is GF(2)'s addition. Funny note, 1 is also -1 (negative 1), aka the additive inverse of 1. And it turns out that XOR is _also_ the additive inverse (aka: subtraction). You see, -1 + 1 == 0, which happens to be 1+1 in GF(2). Ain't modulo math fun?

-----------

Multiplication is just AND btw.

0 * 0 == 0

0 * 1 == 0

1 * 0 == 0

1 * 1 == 1

The multiplicative inverse of 1 is... 1. That is, 1 * 1 equals 1. This one is a bit of a tautology, but it matches the technical definition of multiplicative-inverses (aka: division). This is why I prefer GF(5), because we get a less-trivial multiplicative-inverse.

----------

GF(5) is how I prefer to teach Galois fields. GF(2) and GF(3) are too simple. GF(5) is complex enough that the student actually has to learn Galois fields to understand it, but its simple enough that you can probably teach it to someone within 20 minutes.


Who is this article for exactly? It starts off by trying to relate to the reader by presenting the point about learning a new programming language, but then goes on to explain one of the most fundamental concepts of computing, as if the reader is a complete novice. I would imagine that nearly every person with programming experience, whether that be formal or not, would have at least some familiarity with binary representation and bitwise operations.


> would have at least some familiarity with binary representation and bitwise operations.

There's a whole generation of programmers that aren't interested in, and don't need, "low level" bitwise operations. I would claim that's a "good thing", since it means they're letting someone else do the "boring stuff" by using libraries, allowing them to spend more time solving higher level problems.

When I was going to school, some sort of assembler was the first language you learned. These days, it's usually Python or Java. With a "not real programmers" silliness aside, I hope the tedium continues to decline, and is left for those interested in it, increasing productivity for everyone.


> There's a whole generation of programmers that aren't interested in, and don't need, "low level" bitwise operations.

But it's worth understanding them even if you never need to use them. Much like understanding how a computer works at a low level, that kind of knowledge helps contextualize higher-level things and puts additional tools in your toolbox.


This seems to be addressed to people who want to hack game emulators for early game machines. So they cover the basics and then go on to a game emulator example.

In modern programming, bit-banging can usually be avoided. C++ and Rust both have bit fields in structs, and generics for sets of bits. If you're working on some data structure with AND, OR, and shift operations, that's kind of retro today. The compiler is probably better at doing that stuff than you are.


Web devs? The path that goes:

GFX -> HTML&CSS -> JS -> WTF!

They can be great programmers in JS even but have little to no computer science behind it and more a design background.


What is GFX? A language?


I think gp means GFX like graphic design as opposed to university computer science. At the UI layer those skills are more use than understanding bitwise operations.


Graphics effects.


OT but his mention of the calculator reminded me of discovering just yesterday that the built in Mac calculator has a Programmer mode (apparently Windows too) for hex, etc.

I’d been using the RPN/scientific mode for ever but never new about Programmer mode.

I’m actually doing a fair amount of hex conversions and bit fiddling these days (webassembly bytecode) so it was actually something useful.


And don't miss that clicking the displayed bits will flip them. Can be handy at times, debugging flags for example.


Can someone please explain why malloc(7) allocates only 2 extra bytes in the first slider? It has more than 7 bytes available but only 2 are allocated.


> "Typical of apple to hide this useful feature from us"

It's right there in Calculator's menu: View -> Programmer


It is art. Lovely. In commercial world, it is for low level people -- think --: Hope they do all magic for us :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: