Hacker News new | past | comments | ask | show | jobs | submit login
Avoid Meaningless Binary Labels (bobbiechen.com)
98 points by bobbiechen on Jan 9, 2022 | hide | past | favorite | 57 comments



I kind of disagree. Our brains are very good at learning new words given the correct circumstances. Learning words is an effortless process that occurs through seeing them in context, and the nuance of having specific terminology is often extremely useful.

As a concrete example, if you say "I'm thinking slowly right now", someone might assume you have brain fog or some sort of trouble concentrating unless they've read a particular book. If you say you are using System II-thinking, they either get it because they've read that book, or they understand that they don't and can save themselves a lot of time and confusion and ask what you mean.

Using a simpler word for endianness does not spare you from having to understand endianness, but it might mislead you into thinking you do when you don't.

Re-using simple words can be confusing, and understanding that you do not not understand a word can actually be a feature, especially in technical writing.

As a good counter example, the word 'service' is used way too much in too many different ways. If someone says 'the service isn't working', they could mean the entire web application is down, the kubernetes Service-abstraction is misconfgiured, what it's running is broken, or some piece of code ending in "Service" doesn't compile.


> Re-using simple words can be confusing, and understanding that you do not not understand a word can actually be a feature, especially in technical writing.

I would add something that is counterintuitive but unsurprising to people who have children: learning long/complicated words is actually not a problem at all. Kids love learning and saying long words with interesting sounds in them. Sometimes it’s even easier to learn long/complicated words because they are easier to distinguish from shorter words.

Ask any 4 year old whether they would rather say “Tyrannosaurus Rex” or “dino”.


> they either get it because they've read that book, or they understand that they don't and can save themselves a lot of time and confusion and ask what you mean.

I've read that book and several other works on the subject and I still can't remember which one is system I and which is system II. I get why they use those labels (because they don't want to pre-suppose the characteristics of what they're investigating). But I do find them frustratingly undescriptive.


Easy mnemonic: “thinking fast and slow” lists them in order.


Other mnemonic: on any problem you're thinking about, system 1 responds first, system 2 is second.

System 2 is always second cause it's slower. (Because it's more methodical, etc. etc.)


Many understand the concepts perfectly well, but will still not always remember the arbitrary difference in naming. If memorization of arbitrary words is so effortless with repeated context then no one would mix up common words like effect and affect.


Effect/affect is mostly a problem for people with English as their first language: they learn the language by sound, and in pronunciation there (usually) isn't a difference between them. Same with "could of", "we're/were", etc.


Binary has nothing to do with it insofar that the space is adequately described by two labels. The problem is solely that the labels are unclear, and the argument of the article is harmed by bringing binary into it.

I could just as easily extend the colored functions[1] analogy by bringing into it yellow functions that are also annoying to call and have their own incompatibilities[2], and now we have three labels, red/blue/yellow which tell me nothing at all; honestly I don't even remember which ones are red functions and which ones are blue. Of course in the case of the original article the labels are supposed to be arbitrary and don't matter, but responses to it continue to use the colors and thus attach semantic significance to them.

[1] https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...


(Author here) Thanks, that's fair criticism.

My frustration with the binary case specifically is that because there are only two options, and they should be easier to name for that reason. I didn't really explain that in the post, though, which is my fault.

Yes, these are a subset of unclear naming. I do think naming is such a broad topic that talking about specific subtypes of bad naming is useful, to sharpen our awareness around those specific pitfalls.

In colored functions, yeah, it's unfortunate that some readers continue to extend the (meaningless) colors metaphor, which helped remove preconceptions about sync/async to make a point. I wish people wouldn't do that. See also, the fictitious "Nacirema" https://en.wikisource.org/wiki/Body_Ritual_among_the_Nacirem... .


I think you are correct and am confused why you're being voted down without explanation.


IMO too, it's a stronger argument. The duplicity (which would have been a great word to use in the article) is not the point: open and closed are suitable metaphors for systems. It's the semantic gap/lack of relation between concept and name, and that applies to any other arity as well. It's just that binary separations are very frequent: we really like to split the world in two.


Excellent point. Other examples of easily invertible binary labels:

- "Little-endian" vs. "big-endian": Does "little-endian" mean the little part is at the end or at the beginning? Why not "minor-first" and "major-first"?

- "Type 1 diabetes" vs. "type 2 diabetes": Why not "insulin deficiency" and "insulin impotence"?


"Little-endian" and "big-endian" aren't completely meaningless - they originally came from a description of the history/war between two nations in Gulliver's Travels who disagreed about which end of an egg to open first (the big end or the little end)[1]. Bearing that in mind, it feels fairly natural to think of little-endian as meaning that you start (open) the transmission with the little part of the word.

[1] https://en.wikipedia.org/wiki/Lilliput_and_Blefuscu


In my opinion, these are bad names and we would all have saved a lot of brain power if simple and clearly meaningful names were chosen.

The cultural reference in the names preserves the opinion that the choice of byte order is not worth fighting about. But this is an extra piece of information that's not needed.

Moreover, the choice of the word "end" is unfortunate, since it means both "boundary" (as in the 2 ends of an egg) and "conclusion" (as in the end of a transmission).

To make matters worse, they chose to have "little endian" refer to the case where the big (most significant) byte is at the end of a transmission.

Better names might be something like "smallest-first byte order".

Edit: If you are confused, think of it this way:

Little-endian = little end first

Big-endian = big end first


Agreed, they're not the clearest of names, particularly in a global cultural context. But they're not totally arbitrary like "type 1 error" and "type 2 error".


The thing that this article seemed to be getting at was not just whether the names have meaning — I interpreted it to be specifically about whether two names are easily swappable. It's a dichotomy, but which way does it go?

"Little-endian" is the perfect example: it clearly has something to do with little and big, but it's impossible to tell from the words which one means LSB-first and which one means MSB-first.


Little endian: Least significant byte at Lowest address. It's definitely not meaningless. I also tend to think of it as the Logical one (the significance grows with the address.) All that alliteration aids acceptance and allows accurate agreement.


“endian” … end is the last part of something. It would imply that big/little is describing the last address, not the first.

When it comes to how to store numbers, in English, we read the highest value from the right to the lowest value on the left. Your “logical” is counter to how English-speaking people write numbers.


> When it comes to how to store numbers, in English, we read the highest value from the right to the lowest value on the left.

Huh? When I read 24, "twentyfour", the highest value is on the left, followed by less high values on the right, and spoken in that order.

It's actually different in other languages, in dutch for example I would say "vierEnTwintig", or "four and twenty" translated directly, so saying the least high value first. (... However this is only for the two digits, because upwards we'll say "honderdVierEnTwintig", or "hunderd four and twenty")


I mixed up right and left.


Your “logical” is counter to how English-speaking people write numbers.

As evidenced by the MM/DD/YYYY ("middle-endian") year format that a significant part of the English-speaking world uses, "logical" and "traditional" can mean different things.

The fact that I can't think of any arbitrary-precision library which stores its numbers in big-endian format is evidence that LE is the logical one. Yes, it looks odd in a hexdump, but that's not something you need to deal with often and if you do, you get used to it anyway.


Actually, MM/DD/YYYY is in the minority worldwide. AFAIK, only the US and Canada use it.

All of Europe uses DD/MM/YYYY (Germany uses dots instead of /) and I believe most (if not all) of South America too.

Chinese and Japan effectively use YYYY-MM-DD because of how it's written normally, YYYY年MM月DD日. A lot of the tech and business world also use YYYY-MM-DD because it's an ISO standard, and makes sorting easier if you assume left-to-right reading order.

However, there are two interesting observations:

* Most of the world reads left-to-right, but uses Arabic numerals which is right-to-left. So, most of the world uses big-endian Arabic numerals when in the original language it was little-endian. Also strange that when we say "Arabic numerals", we also don't use their numerals, only their system...

* The difference in US date formats compared to the rest of the English speaking world boils down to how the dates are spoken. In US English, it's typical to say e.g. "January 10", whereas in British English this is practically unheard of. But this explains why to an American "MM/DD" feels more natural. In Britain, sometimes people might say "January the 10th", but "10th of January" is far more natural. This is why we prefer "DD/MM" because it's a natural abbreviation. You can still see vestiges of the earlier British-style usage in US English, e.g. "4th of July" which is ironic for the date that should be least British is pretty much the only one that's still said the traditional British way!


Oh, so naturally big endian is most significant byte at largest address?


Heads I win, tails you lose.


Shaka, when the walls fell.


It is very clearly a reference to this story about a societal schism arising from a meaningless choice. But it doesn’t really make it clear which is which.

You say clearly little endian means you start with the little part. But why couldn’t it be that little endian means the end is the little part?

Endian is not a word that means anything apart from the Gulliver’s Travels story or the order of bytes.


the two types of diabetes are sometimes called "insulin dependent" and "insulin resistant"


One that my brain always had a hard time with was "callee-saved" and "caller-saved" registers. I wish more people used "call preserved" and "call clobbered" instead.


Naming things remains hard, and not only in software development. I suspect it's a human weakness, giving in to oversimplification, cultural preferences, or just bad habits. We could name examples ad infinitum.


I suspect aesthetics has more to do with it than people think.

Everyone loves a secret code, and Hollywood has made an art of of technobabble.

When you say "It's fine, we're on a class 2 supply" you sound really cool and high tech and professional.

It's almost like sprinkling in a latin phrase or some legalese, but instead of conveying a very precise meaning, where even the informal general implications are well known, it adds confusion.

There's always a sense of mystique to these "Domain specific natural languages", but not that many reasons to use them, so people are probably kinda drawn to slip them in.


This. I call the drive to sprinkling one's language with technical terms, acronyms and industry/specialist jargon "going Scientologist". I use this term because, if you've ever known one, they have specialize meanings to a lot of terms and that creates an "in crowd" feeling for their members and an exclusionary club sensation for those on the outside peering in. It is ordinary high school popularity clique dynamics, which works perfectly well on far too many "adults".


Nice article.

Red State vs. Blue State in bipartisan US politics is an inappropriately meaningless binary label while red node vs. black node in a binary tree is an appropriately meaningless binary label.

I feel like that makes sense to me.


I just read your comment about red/black, and I must confess, until this point I would have agreed that it was meaningless. However, I just looked over the definition of the data structure, and I think I can propose better names.

Black nodes are 'counted' nodes, whereas red are 'uncounted'. Uncounted nodes may sit sandwiched between counted nodes, but are not allowed to follow each other... Because relaxing that rule could lead to an explosion of uncounted nodes.

Remember that the length of the path, in terms of counted nodes, to every leaf is the same. Because the uncounted nodes may be sandwiched between those counted nodes, this gives us the property we want that the length of the path of actual nodes, counted and uncounted, on the longest path is never more than twice that of the shortest path. The longest path can have every layer be a counted-uncounted sandwich, while the shortest will only have counted nodes.

I felt like changing the name added something.


In the US, the first color electoral map to be televised used blue for Republican.

In 2000 Red was used for Republicans and since that year involved lots of viewing of the maps due to the protected counting, it stuck


In 2000 and earlier elections, different networks used different colors, without a lot of consistency from election to election. It was only after that election that there became a strong immutable consensus on what color was what.


Not to mention that the rest of the world uses "red" for various sorts of socialist / social democrat / center-left parties (e.g. "traffic light coalition", "watermelon coalition"), whereas the US uses it for the right - in the map context only.

Red/black makes me think of the card suits.


Just north in Canada, red means liberal and blue means conservative


AFAIK, in all the non US world, red means "left" ever since it was the color used by socialists and communist.

I think blue was picked in opposition to this, but that's just an uninformed assumption about history n my part.


Red vs. blue was also a thing in the American Revolutionary War. Not quite sure how (or if) it relates to other uses, but figured I'd throw it out there in case someone else can make the connection.


I was surprised by this:

"Worst of all, imagine this: you've just started an internship at a tech company with a microservices-based architecture. When service A initiates a message to service B, one of them is called the "upstream", and the other is the "downstream". Which one is which? There's no inherent direction, so even within the industry, people don't agree!"

For me, upstream always means "toward the original caller", or "toward the customer/person with the web browser". I had never heard his linked example of something thinking they were "calling an upstream function". If you're the caller, you are upstream from what you're calling.

Is that just me?


If I'm understanding you right, I think I had equal confidence in the opposite direction. In my mind, responses always flow downstream. It's obviously not just you, but at this point I think the right conclusion is the concept is too conflicted to be used unless one is speaking to a small group who has already agreed on a local definition.

What I'm not sure yet is whether the terminology is equally conflicted when talking about open-source projects. If one project depends on another, do we agree that the dependency is downstream of the original? That is, does everyone still agree that the official Linux Kernel is upstream of all Linux distributions? Or do some people view the patches to the kernel as being sent downstream?


So it sounds like you're using the service communication to mean the stream? Although "toward the original caller" suggests that two inter communicating services would be flip-flopping for ownership of "upstream". Aardvark calls Bear, Aardvark is upstream, Bear calls Aardvark.. Bear is upstream.

Stream could also refer to data (upstream=original/raw, downstream=processed/final), the overall process stage (upstream=prior stage, downstream=next stage - this requires knowledge of the system architecture, the stage boundaries are likely business-defined), some business term (eg in energy upstream=energy producer, downstream=consumer). A stream and the messaging direction may not match.


I agree with that, but in this case, the author had already scoped it down to services calling other services in a "microservices" context. And even if the call graph is a->b->c->a->b, I can still say up/down if I give the context of which invocation. I'm hoping people don't do that much in real life though, too easy to recurse.


The main use I've seen for "upstream" vs. "downstream" is in reference to dependencies between projects - often one or more open source "upstream" projects feeding into a combined/enhanced for-pay "downstream" product. For that reason alone, I think using the same terms in a messaging context is a bad idea.


> "upstream and downstream" in microservice architecture.

Those terms are used in relation to dependencies and who consumes what. It has nothing to do with microservice architectures.

This article seems to talk a lot about... nothing. I don't really see the point it's trying to make - we use technical terms not to be human friendly but to be technical. We're not writing for children or for laymen all the time.


But we are writing for humans, and so ease of recollection is a key result that our technical terms are measured against.

Do not make the mistake that technical terms need to be inscrutable, for by hanging on ideas that the student or brew practitioner may have encountered in their life, we can give them a peg to hang their understanding of the concept that we are trying to explain.

Consider, for example, the technical terms, durability, integrity, power, confidentiality, friendly, field, cross.


> we use technical terms not to be human friendly but to be technical.

What does that even mean? Arbitrary useless names are better because they are so devoid of meaning? Making things needlessly hard is “professional”? “Professionals” are superior “technical” people that do not respond to inferior human mechanisms such as “making sense”?


That it conveys more information to a smaller group of people.

If that smaller group is the intended audience, that's better than less information. If the intended audience is broader, less information (layman's terms) might be better.

A doctor can say to me 'we think there's a clot and we need to get you to theatre asap' and that's fine, but I hope they say something more technical (that conveys where it is, the nature of it, whatever) to the surgeon!


> Those terms are used in relation to dependencies and who consumes what.

I think your response exactly illustrates the point of the post: if upstream and downstream had clearer names, you wouldn't need to define them. And if they indeed mean what you say, wouldn't producer and consumer be clearer?

> I don't really see the point it's trying to make - we use technical terms not to be human friendly but to be technical.

I don't think I can agree that technical terms aren't meant to be human friendly. After all, languages exist to convery ideas to other humans; words aren't opcodes intended to be executed by machines.

If technical terms aren't meant to be human friendly, why did humans invent them at all? And if they are intended to be human friendly, why not optimize them for clarity and precision?


> wouldn't producer and consumer be clearer?

Absolutely not. It makes no sense. "There was a producer outage." means nothing. Neither does "I'll have to file a bug with the producer dependency."


Is there a type 3 error? If not, or if the question is as meaningless as I suspect, why would anyone ever want to use "type 1 error"/"type 2 error" rather than "false positive" and "false negative" (or the other way around, I refuse to look it up)? Is someone golfing research papers, trying to save two characters here and there?


Type 3 error is "giving the right answer to the wrong question" [1] and there are uses for a Type 4 error.

This digs into the realm of the "not even wrong" statements. [2]

[1] https://en.m.wikipedia.org/wiki/Type_III_error [2] https://rationalwiki.org/wiki/Not_even_wrong


"The wrong question" is subjective though. I meant in the technical sense.


In the world of boolean logic this could be expressed as:

Type I: true

Type II: false

Type IV: FileNotFound


Ironically, of course, one such label is the adjective "meaningless", which is rather unnecessarily binary in nature.


"Shift left", my current favourite IT buzz word blunder.


Ah yes, boolean blindness.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: