Hacker News new | past | comments | ask | show | jobs | submit login
Base Ten for Almost Everything (randomascii.wordpress.com)
51 points by ingve on Feb 14, 2016 | hide | past | favorite | 82 comments

I personally prefer base 2 units for file size. A 2GB file will take 2GB of memory when I load it in. Hard drive page sizes are also base two, 512 bytes for most drives and 4096 bytes (4K) for newer large capacity drives.

I also prefer base 2 for file transfer speeds just so it matches the file size convention. Fortunately most programs either default to base 2 or let you chose in the preferences.

Network line speeds are in base 10 (100Mbit ethernet is actually 100,000,000 bits/s) but there so much protocol overhead and other factors that comparing transfer speeds and line speeds is moot in most situations.

You may be the first commenter who has actually given a reason for preferring base 2 units for file sizes. I'm not sure it justifies exposing base-2 sizes to non-geek consumers, but it is a valid reason.

Be careful, this may equally well be a reason to advertise RAM sticks as having 8.589GB capacity :)

I'm not sure why "non-geek customers" should care about files or file sizes anymore.

They really shouldn't; I work in IT, and typically describe file sizes to my clients using the metric prefixes; if they're curious I'll explain the difference (1000 is actually 1024 at each level) but the difference is usually negligible, and modern operating systems round the difference away when displaying the simplified sizes to the user.

I grew up with computers so tinking binary is not an issue for me. Storage capacity has always been wierd. A 1.44MB floppy actually has 2,880 512byte sectors. Base 2 and base 10 in the same measurement!

Using base-2 units is fine by me as long as you use the KiB MiB notation so I can immediately understand that you're using base-2.

"Kibi" and "Mebi" is moron speak. A kilobyte is 1024, a megabyte is 10241024, and a gigabyte is 10241024*1024. Period.

Anyone who is using base 10 is simply wrong.

Or lying to sell you a smaller storage unit.

Whatever confusion there exists, it was created by dishonest mass storage manufacturers. Prior to 1990-something, there was no confusion. As larger and larger storage devices became available on the mass consumer market, the sneaky bastards at Seagate, Quantum et al started to use base 10 to make their products appear more capacious.

The solution for that is to simply take back the kilobyte.

Possibly, legislation could help: make it unlawful for a storage device which holds 100000000000 bytes to be called a gigabyte device.

Of course that's historically true. The point is it doesn't make sense so it should be changed. Kilo means 1,000 for every unit of measure except bytes. Mega means 1,000,000 for every unit of measure except bytes.

It doesn't even make sense... 1024 isn't a base two number, it's a base 10 number! If you really wanted to represent 1024 bytes in base 2, it would be 1000000000 bytes. Kibi and Mebi are not moron speak, they are the correct term for these quantities in base 2: https://en.wikipedia.org/wiki/Binary_prefix

Using a base 10 prefix for a base 2 number is just historical weirdness. There's no sensical formal framework in which it fits with other uses of these prefixes. It's good to try to preserve old standards wherever possible. No need to reinvent the wheel. But in the long term in computer science we are trying to get to a world where ideas make sense together and form an easy to understand system. In the long term it's better if people don't have to learn this arcane idea that prefixes are base-10 everywhere except digital storage.

The "kilo", "mega", and "giga" prefixes are part of the SI system of units. They have been around far longer than computers, and are all defined in terms of base 10. A "megawatt" is one million Watts, and has been for over a century. These units don't belong to the computer industry, and computer people can't define them to mean something else just because "byte" is the next word.

If you want to insist that 1KB is 1024 bytes, then you aren't talking about real "kilos", you are talking about fake, computer-nerd "kilos". Stealing these units was wrong, and should have never happened in the first place. The SI isn't being moronic in introducing KiB, MiB, and friends, it's just asking us to return what we've stolen.

computer people can't define them to mean something else

Well, the "computer people" did in fact do that, through natural language extension, which is often irregular and inconsistent.

It is this committee-designed "kibi", "mebi" and "gibi" which are prescriptive. And for that reason they are broadly rejected in the industry.

"Mega" is just a foreign prefix that can flexibly serve any purpose you want, just like a Greek letter in mathematics.

Just like the Greek λ can be "anonymous function", "eigenvalue", or "air/fuel ratio", the Latin "mega-" can be 1000000 or 1048576.

Nobody had any problem with this whatsoever, until storage manufacturers started lying in order to sell you a 781 Mb drive as 800 Mb.

I like to mock the drive-maker's kilobyte too, but in their defense the tradition is older than standardized byte and word sizes. There was a time when drive sizes were measured in bits, and they naturally used SI prefixes for those -- nobody has bit-addressable memory.

That said, they aren't measuring them in bits any more, and using gigabyte in the SI way is definitely misleading.

> Possibly, legislation could help: make it unlawful for a storage device which holds 100000000000 bytes to be called a gigabyte device.

Even tapes, which along with every other "piece of string" device, have been base10 since before this cp/m / *dos mistake existed?

Why should they now be forced to be wrong?

(also, be careful what you ask for. If this were to be legislated, it'd be perfectly logical for them to fall inline with NIST - who don't agree with you on this one)

> Anyone who is using base 10 is simply wrong.

You can say that, but that doesn't make it true, or reasonable. The facts on the ground are that OSX and drive manufacturers use base-10 MB/GB.

And, as I point out in the article, CPU frequencies, GPU frequencies, thumb drive capacities, and hard drive capacities are all actually base 10, so it's not like base 10 is going to go away.

Why should the capacity of memory chips affect how we describe file sizes and drive sizes? Please explain?

Frequencies are irrelevant. Yes, frequencies are base 10 all the way back to the dawn of RF in early radio before electronic computing. So what?

Nobody wants base 10 to go away. Just away from the prefixes in kilobyte, megabyte, gigabyte!

"Giga" and "mega" do not mean exactly a million or billion; they are the root of words like "megalomania" and "gigantic".

Mega: From Ancient Greek μέγας ‎(mégas, “great, large, mighty”), from Proto-Indo-European meǵh₂s ‎(“great”). Cognate with Latin magnus, and with Germanic words: Gothic 𐌼𐌹𐌺𐌹𐌻𐍃 ‎(mikils), Old English micel, Middle English muchel, English much, Old High German mihhil, Old Norse mikill, Danish meget.* [Source: https://en.wiktionary.org/wiki/mega-]

Giga: From Ancient Greek γίγας ‎(gígas, “giant”); cognate to giant. [https://en.wiktionary.org/wiki/giga-]

Nothing about a million or billion!

Kilo is the one that means one thousand, from Greek. (χίλιοι / khílioi).

> Why should the capacity of memory chips affect how we describe file sizes and drive sizes?

So that, for example, a gigabyte of swap space corresponds exactly to a gigabyte of RAM. Doh? Who wants the description of size to change when data is moved between RAM and disk?

Speaking of "memory chips", flash storage is memory chips too. We should use the same units for memory chips that we use for ... memory chips.

> Frequencies are irrelevant.

No they're not. They're very relavent for anything dealing with networkig and communications.

I recently programmed an atmega chip for a homebrew project and the frequency of the SPI bus was very relavent.

They are not relevant to measures of information storage. A megahertz is as irrelevant in this debate as a centimeter, or kilopascal --- except to the extent that it bolsters the argument that mega means million in all kinds of units, which nobody is disputing.

> thumb drive capacities

I don't recall ever seeing a thumb drive advertised base 10. Have you seen base 10 for thumb drives often?

I have never seen a thumb drive advertised using anything other than base 10. You have probably assumed that since thumb drives and SD cards typically have capacities of 8/16/32/64 GB that the GB are actually GiB. But that assumption is wrong.

Check out the actual capacity. As I mentioned in the blog post, every flash drive and SD card that I have checked has used the base-10 GB. The extra capacity is there, but it is kept in reserve for bad blocks and headroom needed for efficient use of flash. It is entirely invisible outside of the flash drive.

Base 10 is far more prevalent in computing than most computer experts realize.

Using the ibi/ebi notation is fine when it is written, especially when you have base 2 and 10 in the same document.

Speaking it however makes you sound like you have a speach impediment and non geeks will think your're a moron. You can deduce the base of the unit from context most of the time and when you can't, unlike when reading, I can ask you for clarification right away.

I always find this to be fascinating logic. Where else in computing is it perfectly normal to hear "I prefer to be wrong"?

I hereby define 1 megawatt to equal 7^7 watts, because I find that more convenient. Any light bulbs not sold in base 7 units are part of a conspiracy by light bulb manufacturers.

How is this wrong?

Sticking with the GB example: GB is the SI unit symbol for gigabyte. The SI prefix giga denotes 1000³. The proper way to denote a drive size in terms of 1024³ bytes would be gibibytes, or GiB.

The byte notation doesn't make any sense to me if you're using SI. It's kind of like saying Giga Inches or KiloCM, just seems wrong

I'd consider a byte to be like a mole: it's the base unit that we actually use, and it's proper to apply an SI prefix to, but it's defined as a specific multiple of a smaller unit (8 bits vs 6.02 * 10^^23 particles).

"Giga inches" sounds wrong for two reasons: first, that larger distances like that are customarily expressed in units more suited to their scale, and second, that that measurement system generally doesn't use SI-style prefixes.

"KiloCM" sounds wrong because you're applying two SI prefixes to one unit and using the wrong capitalization for both "centi" and "meter". kcm could technically be represented by the "dam" unit, although it's more common to show things in that scale just as meters.

KiloCM does sound wrong, but it is interesting that we happily say "1,000 km", when we could say "1 Gm". Fashion and familiarity in that case.

I need to remember to say "it's 3.8 Gm from Seattle to New York".

Well KiloMol is pretty weird too.

IDK,I just think going with bits is neater.

Because a 2GB file will use 2GB of memory and a 2GiB file will use 2GiB of memory, but a 2GB file will not use 2GiB of memory, just because memory size is usually represented in base-2. It's all about nomenclature.

Oh. Well that's a Microsoft problem, really. Most free software has been using the correct abbreviation for base 2 (KiB not KB) for some time now. Further, the annoying habit of using base 10 for line speeds and disk space is finally dying. For example, if Deluge tells me that I'm downloading a 4 GiB file at 4 MiB/s, I know it will finish in 1024 seconds and take up 4 GiB either on disk or in memory.

Because frankly, opinions and standards are different things. Linespeed is base10, has always been base10, predating harddrives, or computers. So networks, pretty much the only thing we all have in common, are base10. Modems are base10 (14.4k is 14400 baud, not 14745). Almost everything is base10. It's not opinion. It .. is.

Memory is base2 because that's how we address arrays. It's the odd one out. Preferring to believe otherwise just doesn't seem very geek to me.

The only real problem is when 10-base prefixes are used to represent 2^10-base values (or vice versa, if that happens). Everything else is just preference, as quite clearly evident here:

"even when dealing with non-technical customers"

Bruce, your implicit assumption is that non-technical customers actually care about units in absolute terms. They don't, that's why some devices (e.g. Nintendo consoles) deal with "blocks" instead of megabytes or mebibytes. Non-technical users care about relative terms. Whether these are represented in 2- or 10-based units, ferrets or chimichangas is irrelevant. My mother wants to know whether data on her 2GB pendrive can be stored on HDD with 20GB space. 2<20, so the answer is yes, case closed.

OS (or ecosystem in a broader sense) should be consistent, that's all there is to it (and since it isn't and won't be, it should at least be honest). Whether 1000 or 1024 "makes more sense", to "some or all" parts of the computer "visible or not" is entirely a matter of taste. "Makes sense" is as meaningless metric as "is natural". Everyone can (and will) twist it any way they like by providing various examples that adhere to their preference.

It is true that non-technical customers often don't care. Although, they do care when they buy a 1 TB drive and get told by Windows (but not OSX) that it is only 916 GB.

Imagine a customer with a 7.7 GB file (as reported by Windows) who purchase an 8 GB thumb drive to hold it. The inconsistent units mean that it won't fit. We should make our units consistent and we should use base-10 because people.

Your argument is good for showing the value of consisent units.

But it could be base 2 or base 16.

Consistency would be better, whichever way was chosen (base 2 or 10).

Although even with base-10 being used all around, I might still end up explaining why an 8 billion byte file won't fit on an 8 billion byte USB key, while a disk image of the same size would because of file system overhead.

> My mother wants to know whether data on her 2GB pendrive can be stored on

... four "500MB" drives? Because it won't fit.

Using correct units, 4×500MB=2GB. But 4×500MiB=1.95312GiB. That's an excellent argument for 1000-based units.

Q: "Why does my computer list the size of my drive as being smaller than what the box says it is?"

A: "Because drive manufactures chose to use a different way of counting to make you think the drive is bigger than it actually is."

Problem solved.

Consistency trumps correctness. If some computer specs are in binary and not others, the result will be confusion because most people won't know what base is being used at any given moment. There will be far less confusion if we pick one base and stick to it.

Until hard-drive manufacturers started trying to deceptively market smaller drives with larger sounding base-ten capacities, everything in computer-land was base-2 and things were simple. Now I can't fully trust any number put before me without digging into the specs. How is this a step forwards?

Let's draw a parallel with physics. Draw an electrical circuit. The current is measured to be traveling to the right on a wire. Which way are the electrons going? Left. When current was discovered, it was assumed charge carriers were positive. They turned out to be negative (most of the time). That was a mistake, but did we fix it? Nope. It would have caused far more confusion than it was worth. If physics can stick with current flowing the wrong way for centuries for the sake of consistency, we can stick with base 2 hard drive capacities.

> Consistency trumps correctness.

Then why does computing need a special case and redefine standard prefixes? A kilosomething is always 1000 something. (Unless it's reported by Microsoft Windows, apparently.)

> Until hard-drive manufacturers started trying to deceptively market smaller drives with larger sounding base-ten capacities, everything in computer-land was base-2 and things were simple.

Ah yes, like the IBM 53FD, a "1.2 MB" floppy that had 1200x1024 byte storage?

That was in 1977, by the way. All later floppies also didn't adhere to the arbitrary prefix redefinition. Nor did hard disks. Nor optical disks. Nor do USB sticks. Nor do bandwidths. Nor do SSDs. RAM incidentally does, but only because it naturally happens to have power-of-twos capacities. NVRAMs probably won't.

What kind of "consistency" are you trying to defend by pretending that kilo=1024?

> How is this a step forwards?

Once everyone unfucks themselves and we use binary IEC and metric SI prefixes like we should, everything will be less ambiguous than at any arbitrary point in the past 50 years.

> everything in computer-land was base-2 and things were simple

This has never been true though. Processors have never been "kibihertz". Nor busses, nor modems. Just ram.

The board I'm building at the moment has ram specified as 10ns. That's 10x10^-9 seconds, not 10x2^-20 seconds. Which sounds silly to have to even say, but that's how I feel every time someone wants to pretend that just because something was commonly held when they were at school, does not mean it fits today's standards.

And harddrives, the one everyone obsessed about because their OS incorrectly shows them base2. No. No they weren't. The famous pictures of an IBM305 being lifted out of a plane. That's a 5Mb drive. It holds 5 million characters. 5,000,000 characters.

It's embarrassing enough that this fallacy is older than half our readers. It's even more embarrassing that it was never true in the first place.

I agree that consistency is critical. It was always confusing that "kilo" meant 1000 everwhere else but for some reason it meant 1024 when it came to storage sizes (but not bandwidth speeds). To eliminate ambiguity in one fell swoop we should use the KiB MiB units.

> "kilo" meant 1000 everwhere else but for some reason it meant 1024 when it came to storage sizes Only when using storage. They sell storage by 1000 to inflate the number shown to customers.

It was never confusing.

Bandwidth speed is in bits. Storage is in bytes.

When the unit is anything-bytes, it is powers of 1024.

Note that the byte itself is a power of two already.

This obnoxious KiB MiB nonsense is a non-solution in search of a problem.

Wikipedia on MiB: The binary prefixes have been accepted by all major standards organizations and are part of the International System of Quantities. Many Linux distributions use the unit, but it is not widely acknowledged within the industry or media.

Basically, we only need this type of unit in a context where dishonest and ignorant people are involved, in order to protect the latter from the former.

> Storage is in bytes.

Floppy disk storage is in byte, and doesn't use 1024.

CD storage is in byte, and doesn't use 1024.

DVD storage dito.

BluRay storage dito.

Hard disk storage dito.

SSD storage dito.

Memory card and USB stick storage dito.

The sole exception is RAM. (But not NVRAM. Because that would be silly.)

> a non-solution in search of a problem

Really now.

Which floppy disk?

My Apple II floppy drives had 256 byte sectors, groups of which were called a kilobyte.

Wikipedia quote: An MFM-based, "high-density" format, displayed as "HD" on the disks themselves and typically advertised as "1.44 MB" was introduced in 1987; the most common formatted capacity was 1,474,560 bytes.

(Why is it 1.44 MB? That's a bit of silliness: it's the result of dividing 1,474,560 by 1024, and then just moving the decimal point by three places instead of dividing by 1024 once again. The beginning of idiocy.)

All the storage examples you're giving are due to dishonest mass storage manufacturers who confused things sometime in the 1990's simply by lying.

All other storage uses 1024. A gigabyte of RAM is 512 megabytes times two.

RAM is not the "exception"; RAM is fundamental. A computer can operate without mass storage, but not without some RAM.

A good reason to use powers of two sizes whenever bytes are involved is that the byte itself is a power of two unit already: it has historically settled on 8 bits.

A kilobit of data already does not correspond to a power of ten byte value.

If you want metric byte storage sizes, then first redefine a byte as 10 bits. Then a bit can be a centibyte, a byte can be a decabyte. Ten megabits will be exactly a megabyte, and so forth.

> If you want metric byte storage sizes, then > first redefine a byte as 10 bits.

I think this is the funniest rationale for using base-2 that I have come across. It makes no sense.

Memory works conveniently with base-2 because of the address lines. n address lines can select 2^n addresses. The size of the bytes stored at those addresses is completely irrelevant.

Your logic would suggest that US gallons, which contain exactly 256 tablespoons, should use binary units, as in "%1010 gallons of gas please" or "that tanker truck can carry 11.3 KGallons (where K equals 1,024)".

Base-2 definitely works most neatly for memory capacity, because memory chips are exactly a power of two, always. There's no need to invoke the (irrelevant) size of the byte.

> Which floppy disk?

IBM 53FD, 1977. Apparently the "idiocy" started really early…

> All other storage uses 1024.

What "all" other? Only RAM.

> A kilobit of data already does not correspond to a power of ten byte value.

That's not the problem of the SI prefixes, which are used for all units. IT does not operate in a vacuum.

> If you want metric byte storage sizes

Bullshit, nobody wants that. That's what IEC binary prefixes are for.

What quantity has two SI units such that one is 8 times the other, and are kilo-, mega-, etc used for both?

Neither bit nor byte are SI units.

What, exactly, is the problem with using IEC binary prefixes for multiples of 1024?

Flash drives are base 2. The memory chips inside are base 2 capacity. The flash control chip just reserves some space for wear leveling and other internal functions.

Flash drives are sold in base-10 units. Nobody gets to see the base-2 nature of the internals.

Three flash drives I have on hand:

  64 GB:
  Disk /dev/sdi: 60 GiB, 64424509440 bytes, 125829120 sectors
  Units: sectors of 1 * 512 = 512 bytes
  32 GB:
  Disk /dev/sdi: 28.9 GiB, 31037849600 bytes, 60620800 sectors
  Units: sectors of 1 * 512 = 512 bytes
  8 GB:
  Disk /dev/sdi: 7.2 GiB, 7743995904 bytes, 15124992 sectors
  Units: sectors of 1 * 512 = 512 bytes
Apparently they're just kinda sorta close to what they're sold as, and they're not accurate in base10 or base2.

Thanks for sharing those numbers. Definitely closer in base 10, but not as close as mine were.

The 32 GB one is disturbingly off. 31 GB would have been more honest. If the spare sectors required go up more then they'll have to adjust their marketing numbers and maybe we will see 30 GB, 60 GB, or 120 GB flash drives sold.

Actually, there is little consistency in the advertised size of flash drives.

USB sticks are predominantly sold in base-2 sizes (32, 64, 128 GB etc). In fact, I don't think I've ever seen a 30GB or 100GB usb disk.

How SSD sizes are advertised depends entirely on the manufacturer, and even manufacturers don't use a single base: Samsung's EVO series are advertised in multiples of 10 (250, 500GB), while its PRO series use sizes in multiples of 2 (256, 512GB). Sandisk uses 120 GB, 240 GB sizes. I'm not even sure if all those sizes are supposed to represent base-10 or base-2 prefixes.

But those "32", "64", etc., numbers are times 1,000,000,000 bytes.

That isn't the case with some of the ones that I have on hand. This "128GB" stick is "128GB" as a multiple of 1,069,285,120. Presumably, it's 128GiB less an allocation of spare blocks and the filesystem overhead.

MLC (http://searchsolidstatestorage.techtarget.com/definition/mul..., http://birchtree.me/blog/nand-flash-party/), which is used quite a bit in consumer flash, is base n>2.

Those links claim devices use n=4, 8, 16,... but apart from practicalities, I don't see a reason why n=3, 4, 5,... wouldn't work.

Also, these devices may get produced in power of two sizes units, but they all use error-correcting codes that turn their effective size in something non-power of two.

The memory chips are base 2 capacity, yes.

But in what sense are the flash drives base 2? Their capacity isn't. You have to justify your claims.

> But in what sense are the flash drives base 2?

In the sense that the memory chips inside are base 2 capacity and the sole reason you are seeing less is because the flash control chip just reserves some space.

What's the point of pretending you don't understand someone's comment and asking for clarification?

Fundamentally, address decoders are made to operate with binary inputs. That constrains the scope of what they address to powers of 2. With large capacity storage that becomes less relevant as they are overprovisioned with some redundant capacity, but in reality such devices are implementing Nx2^M storage, not Nx10^M.

Compact Flash, SD, thumb drives are all advertised base 2. SSD are advertised usually base 10 but I've seen some advertised base 2. No consistency. But that's marketing for you.

The number of memory cells will be as they need to be addressable...

Yeah well maybe if someone suggested a reasonable base-2 prefix we could use that. Nobody is going to say "mebibyte" or "gibibyte" though. It sounds too ridiculous.

That's just outrageous nonsense. I would agree if the argument was about nomenclature: it should be GiB instead of GB, yes — but things are "naturally base-2" in computers. Yes, you can make an SSD base-10 or whatever, the same way as a frequency really could be π^10 Hz or whatever, but your hardware without a software is nothing but a pile of garbage, and your software has really no choice but to deal with stuff that is base-2. RAM is not base-10 and registers are not base-10, the address space for anything is not base 10. Your HDD (or SSD, or some virtual block-device) will have a filesystem on it, which is divided by blocks that are not base-10. In fact, the very flash memory of yours has pages that are 2 or 4 KiB, and not 4000 B. If anything, disk couldn't be "base-10" anyway, as bytes (or, if we ditch them as well) bits are base-2. Even if you'd make a device that would be perfectly base-10 in it's storage capacity, addressing it then would be a problem.

And as a post scriptum, that manner of dividing people by "geeks" and "not geeks" is stupid and annoying by itself. It doesn't really matter how ignorant your user is, the reality won't bend to match his expectations: instead, when dealing with something new he would need to adapt his thinking to match reality, or just ignore any discrepancies that occur, if he doesn't care that much. When things can be more "user-friendly" — they should. If they can't because things are what they are — well, that's it.

> things are "naturally base-2" in computers

No they aren't. Some things are naturally base-2 in computers. Far from all. Read the article.

Frequencies already are base 10. And while your HDD is not a tidy power of ten, it also isn't a tidy power of two. Being a huge multiple of 4-KB doesn't make it a power of two.

Your SSD isn't a tidy power of two either, for different reasons.

Computer specs are non-binary in a surprising number of ways.

> Read the article.

I already did, and already commented that this is ridiculous nonsense.

> Frequencies already are base 10

I guess I explicitly mentioned it above, but seems that I have to repeat myself. Frequencies are base nothing. "Frequency" is just how many times something happens in a given period of time. Period of time may be anything: a minute, which doesn't contain 100 seconds, as you are surely aware; a time light in vacuum travels exactly 1 foot, which also wouldn't be a power of 10 compared to a second — anything. How many ticks your processor makes during this time is completely arbitrary as well: you can make it 1 tick more or 77.5 ticks less, it won't care. The reason why you think it's "base 10" is just that it's expressed in Hz, as Hz is an unit of SI, which uses powers of ten for everything. Which makes sense when we are talking about physics. You can "make" the frequency of your processor "base-something-else" without even physically changing anything: just use your own period of time instead of 1 second.

How many bytes do you have on your hard drive is not so much about physics, however. You could construct your own storage device with completely arbitrary number of storage cells with completely arbitrary number of states each — that's not the important part. I think you understand that it's absolutely trivial thing to do: in a sense, you can invent a storage device with such a property yourself in a few hours — or even minutes, if you are fine with making that storage device "virtual". What is the important part is that "storage capacity" of your device means absolutely nothing before it's plugged into your computer and used with existing software, which runs on processors that are base-2, using filesystems with address space that are base-2. It's not about the hardware, it's about how we treat it in our software, which is inherently base-2. Calculating sizes for that in base-3 or base-10 is not impossible, of course, but it is just alien. It adds complexity when the whole point of whining in that blogpost is to reduce the complexity. The only way to actually reduce it here is to acknowledge that the things are what they are: even if your HDD has capacity of exactly 3333 bits, your data is in some sense "base-2", even if not literally. Even if your HDD doesn't care "base-what" it is — your RAM and your processor do. So it makes more sense to not use 2 standards and just measure everything information-capacity-related in powers of 2. And if size of your file is expressed in MiB: your HDD should be too, even if it's a fractional number of MiBs on that.

Small, medium and large, that might do it, maybe with X-Large and XX-Large for the American market. Works for pants and plenty of fast moving consumer goods. Cuts out having to have detailed specs, plus it is not exactly as if a computer is something that has to be sized to a person's anatomy. The same could be applied to screen sizes, instead of clumsy acronyms like 'CGA', 'SVGA' etc.

I am actually a traditionalist, for me, bytes matter, as to megabytes and gigabytes, not forgetting terabytes. I like the consistency of Base 2, Base 10 for disk sizes always makes me feel short changed. But I am on HN, I cut my teeth on 6502 and I know my two times table. But my uncle, with his top of the range iPhone whatever? He takes lots of photos and has a huge iTunes collection, for him 'large' might be the thing that informs his choice. My auntie? She doesn't do music but she plays those Angry Birds games, 'medium' will do nicely for her. My nephew? Small should do him fine (he always loses his phone so never has much on it).

Of course people in-between 'developer' and 'techno-phobic' might need a little guidance in choosing size, but right now these megabyte and gigabyte things are just plain confusing. 'Best for photos - medium' might be the strap line to sell the product.

Of course, like clothing sizes could be on a per-brand basis. Much like how you can be a 'Large' in Nike jacket sizes, you could be an 'X-Large' in some equivalent Italian leisure-sports-wear brand. In the world of gadgets you could therefore be 'Large' in Apple device-land and 'X-Large' in Samsung gadget-land. Even bicycles that were once measured in distance from bottom-bracket to top-tube are now simply sold as 'Small/Medium/Large'.

This simple sizing could make things a lot simpler for all involved, side-stepping this Base 2/10 nonsense.

I read your post as a huge piece of sarcasm, but the sad reality is that with sales/marketing people ru(i)ning the asylum, the tech world is really turning into what you've described.

Mandatory XKCD: https://xkcd.com/394

While we're at it, why are networks rated in bits per second but network transfers in bytes per second?

Tradition. In early times you were only able to signal one bit at a time on one wire. At that time there were also no consensus on the size of a byte in bits.

Fun fact: 1kbit/s is always 1000 bits per second and never 1024 bits per second.

fun fact: and this is why you need to divide by 8 to get theoretical speed, or use mental shortcut of dividing by 10 to approximate protocol overhead.

different layers. Wifi is always marketed using scammy layer 1 speed, this is why your 54G never went above 1/2 claimed speed.

The explanation that "base 2 is natural for memory" makes absolutely no sense.

Base 2 prefixes make sense for memory because memory chips have a power-of-two capacity. Base 2 prefixes makes sense for address space because n bits can identify 2^n different addresses. Page sizes are base 2 because it allows for easy bit masking to select the page number and the address within the page. Bit masking is, in fact, one of the main advantages of base 2. So yeah, base 2 has its place.

We can make hard drives that have a base 2 capacity. We can make flash drives with that capacity. The reality is that we can pretty much make them any capacity we want. The fact that it's convenient to calculate with base 2 capacities on a machine where it is most convenient to calculate with base 2 numbers comes as no surprise and is equally valid to any form of memory or storage medium.

That doesn't make memory "naturally base 2", of course. memcpy doesn't stop working when I specify a size that isn't a multiple of sizeof(int). We could work with non-base 2 memory capacity perfectly well while losing none of the performance.

So the author here ends up trying to make post hoc rationalizations for why some things ended up base 2 and other base 10 when really for any amounts the benefits he ascribes to base 2 memory apply.

(this is speculation only but I suspect it's right) When the individual memory chips on a RAM stick have base-2 capacity, it is easy to make a request get serviced by the right chip. You just take some higher bits of the address as the chip ID and the lower bits as the offset within that chip. With non base-2 chips, you'd have to do many comparisons and a subtraction, or deal with discontinuous memory. It probably also helps with the internal design of the memory chips for similar reasons. I'm guessing the same reasoning doesn't apply to hard drives because of much different performance and technology (disks have possibly multiple microcontrollers inside which don't have much problem with a bit of integer arithmetic).

You're reading more into that sentence than I intended. I meant that base 2 is natural for memory capacity because that is how chips are made, so it makes things work more neatly. It lets us say "8 GB of RAM" instead of "8.59 GB". All RAM chips have a power-of-two capacity and while that might be multiplied by a small non-power-of-two (sometimes three) when multiple chips are put in a system, powers-of-two make talking about memory easier. Ditto for 4-K pages, 2-MB pages, 64-KB caches, etc.

When allocating buffers and using memcpy there is (modulo cache effects) no reason to prefer base 2. char buf[1000] works about as well as char buf[1024].

> We can make hard drives that have a base 2 capacity.

Sure, but it's a needless constraint. Whereas I'm not aware of anybody ever making a RAM chip that isn't a base 2 capacity.

base 2 is natural for memory capacity because that is how chips are made

I think this is a fundamental misconception. How they're made is irrelevant except for why they're made that way. And that's because of memory addressing: a memory address is signaled by using a number of parallel bit lanes. The power of 2 comes from the fact that every lane has two states. The same goes for aggregating multiple addresses (like page size and cache size): the most efficient (and therefore only viable) way to aggregate blocks of contiguous memory is to mask out the least significant bits of the address, which is why you also get base-2 multiples for memory aggregation.

Therefore, the fundamental expression of memory capacity is the number of usable address lines multiplied by the minimum addressable unit (byte for RAM, 512 bytes for old hdd's, 2048 bytes for cdrom, 4096 bytes for new-format disks).

As for hard disks, they use the same method of adressing as memory does! In fact, the fundamental addressable unit already is a multiple of 2, not 10, as I mentioned above. The main difference between volatile and non-volatile memory is that the number of address lines is set by standard (it's still 48 I believe) so the available capacity must be probed by some other way. And as hard disks are slow anyway, manufacturers can get away with adding a new layer of indirection (firmware) on their disks which hides the address implementation.

But in my view, there is no valid technical reason for storage sizes to be expressed in multiples of 10.

A memory chip has a capacity that is an exact power of two. Hard drives - not so much. Check out this reference:


The drive they discuss has 6 read/write heads, 6,810 cylinders, and 122 to 232 sectors per track (presumably more on the outer tracks).

None of these numbers are the slightest bit base-2-ish, which is why capacities are rarely near powers of 2. So, there is no valid technical reason for storage sizes to be expressed in powers of 2.

Hum, nope. Solid state memory is inherently built in powers of 2. Since addressing is a big part of the chip, and addressing is inherently at powers of 2, making them in any other way would be wasteful.

Hard disks aren't inherently divided in cylinders, sectors and plates. Those come in several different distribution, without the power of 2 bias. Flash memory comes in powers of 2, but flash drivers reserve some of it, thus flash drivers are inherently a little bit smaller than a power of 2.

Idea: since gallons are inherently binary (there are 3*2^8 teaspoons in a gallon) we should use Ki for them. Instead of "I put twenty gallons of gas in my car" we can say "I put 15 KiTeaspoons of gas in my car".

Gallons are more binary than hard drives (eight ninths of the factors are two) so my logic is unassailable.

I am most upset at this continued insistence of using a Bel as unit of data.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact