I also prefer base 2 for file transfer speeds just so it matches the file size convention. Fortunately most programs either default to base 2 or let you chose in the preferences.
Network line speeds are in base 10 (100Mbit ethernet is actually 100,000,000 bits/s) but there so much protocol overhead and other factors that comparing transfer speeds and line speeds is moot in most situations.
Anyone who is using base 10 is simply wrong.
Or lying to sell you a smaller storage unit.
Whatever confusion there exists, it was created by dishonest mass storage manufacturers. Prior to 1990-something, there was no confusion. As larger and larger storage devices became available on the mass consumer market, the sneaky bastards at Seagate, Quantum et al started to use base 10 to make their products appear more capacious.
The solution for that is to simply take back the kilobyte.
Possibly, legislation could help: make it unlawful for a storage device which holds 100000000000 bytes to be called a gigabyte device.
It doesn't even make sense... 1024 isn't a base two number, it's a base 10 number! If you really wanted to represent 1024 bytes in base 2, it would be 1000000000 bytes. Kibi and Mebi are not moron speak, they are the correct term for these quantities in base 2: https://en.wikipedia.org/wiki/Binary_prefix
Using a base 10 prefix for a base 2 number is just historical weirdness. There's no sensical formal framework in which it fits with other uses of these prefixes. It's good to try to preserve old standards wherever possible. No need to reinvent the wheel. But in the long term in computer science we are trying to get to a world where ideas make sense together and form an easy to understand system. In the long term it's better if people don't have to learn this arcane idea that prefixes are base-10 everywhere except digital storage.
If you want to insist that 1KB is 1024 bytes, then you aren't talking about real "kilos", you are talking about fake, computer-nerd "kilos". Stealing these units was wrong, and should have never happened in the first place. The SI isn't being moronic in introducing KiB, MiB, and friends, it's just asking us to return what we've stolen.
Well, the "computer people" did in fact do that, through natural language extension, which is often irregular and inconsistent.
It is this committee-designed "kibi", "mebi" and "gibi" which are prescriptive. And for that reason they are broadly rejected in the industry.
"Mega" is just a foreign prefix that can flexibly serve any purpose you want, just like a Greek letter in mathematics.
Just like the Greek λ can be "anonymous function", "eigenvalue", or "air/fuel ratio", the Latin "mega-" can be 1000000 or 1048576.
Nobody had any problem with this whatsoever, until storage manufacturers started lying in order to sell you a 781 Mb drive as 800 Mb.
That said, they aren't measuring them in bits any more, and using gigabyte in the SI way is definitely misleading.
Even tapes, which along with every other "piece of string" device, have been base10 since before this cp/m / *dos mistake existed?
Why should they now be forced to be wrong?
(also, be careful what you ask for. If this were to be legislated, it'd be perfectly logical for them to fall inline with NIST - who don't agree with you on this one)
You can say that, but that doesn't make it true, or reasonable. The facts on the ground are that OSX and drive manufacturers use base-10 MB/GB.
And, as I point out in the article, CPU frequencies, GPU frequencies, thumb drive capacities, and hard drive capacities are all actually base 10, so it's not like base 10 is going to go away.
Why should the capacity of memory chips affect how we describe file sizes and drive sizes? Please explain?
Nobody wants base 10 to go away. Just away from the prefixes in kilobyte, megabyte, gigabyte!
"Giga" and "mega" do not mean exactly a million or billion; they are the root of words like "megalomania" and "gigantic".
Mega: From Ancient Greek μέγας (mégas, “great, large, mighty”), from Proto-Indo-European meǵh₂s (“great”). Cognate with Latin magnus, and with Germanic words: Gothic 𐌼𐌹𐌺𐌹𐌻𐍃 (mikils), Old English micel, Middle English muchel, English much, Old High German mihhil, Old Norse mikill, Danish meget.* [Source: https://en.wiktionary.org/wiki/mega-]
Giga: From Ancient Greek γίγας (gígas, “giant”); cognate to giant. [https://en.wiktionary.org/wiki/giga-]
Nothing about a million or billion!
Kilo is the one that means one thousand, from Greek. (χίλιοι / khílioi).
> Why should the capacity of memory chips affect how we describe file sizes and drive sizes?
So that, for example, a gigabyte of swap space corresponds exactly to a gigabyte of RAM. Doh? Who wants the description of size to change when data is moved between RAM and disk?
Speaking of "memory chips", flash storage is memory chips too. We should use the same units for memory chips that we use for ... memory chips.
No they're not. They're very relavent for anything dealing with networkig and communications.
I recently programmed an atmega chip for a homebrew project and the frequency of the SPI bus was very relavent.
I don't recall ever seeing a thumb drive advertised base 10. Have you seen base 10 for thumb drives often?
Check out the actual capacity. As I mentioned in the blog post, every flash drive and SD card that I have checked has used the base-10 GB. The extra capacity is there, but it is kept in reserve for bad blocks and headroom needed for efficient use of flash. It is entirely invisible outside of the flash drive.
Base 10 is far more prevalent in computing than most computer experts realize.
Speaking it however makes you sound like you have a speach impediment and non geeks will think your're a moron. You can deduce the base of the unit from context most of the time and when you can't, unlike when reading, I can ask you for clarification right away.
"Giga inches" sounds wrong for two reasons: first, that larger distances like that are customarily expressed in units more suited to their scale, and second, that that measurement system generally doesn't use SI-style prefixes.
"KiloCM" sounds wrong because you're applying two SI prefixes to one unit and using the wrong capitalization for both "centi" and "meter". kcm could technically be represented by the "dam" unit, although it's more common to show things in that scale just as meters.
I need to remember to say "it's 3.8 Gm from Seattle to New York".
IDK,I just think going with bits is neater.
Memory is base2 because that's how we address arrays. It's the odd one out. Preferring to believe otherwise just doesn't seem very geek to me.
"even when dealing with non-technical customers"
Bruce, your implicit assumption is that non-technical customers actually care about units in absolute terms. They don't, that's why some devices (e.g. Nintendo consoles) deal with "blocks" instead of megabytes or mebibytes. Non-technical users care about relative terms. Whether these are represented in 2- or 10-based units, ferrets or chimichangas is irrelevant. My mother wants to know whether data on her 2GB pendrive can be stored on HDD with 20GB space. 2<20, so the answer is yes, case closed.
OS (or ecosystem in a broader sense) should be consistent, that's all there is to it (and since it isn't and won't be, it should at least be honest). Whether 1000 or 1024 "makes more sense", to "some or all" parts of the computer "visible or not" is entirely a matter of taste. "Makes sense" is as meaningless metric as "is natural". Everyone can (and will) twist it any way they like by providing various examples that adhere to their preference.
Imagine a customer with a 7.7 GB file (as reported by Windows) who purchase an 8 GB thumb drive to hold it. The inconsistent units mean that it won't fit. We should make our units consistent and we should use base-10 because people.
But it could be base 2 or base 16.
Although even with base-10 being used all around, I might still end up explaining why an 8 billion byte file won't fit on an 8 billion byte USB key, while a disk image of the same size would because of file system overhead.
... four "500MB" drives? Because it won't fit.
Using correct units, 4×500MB=2GB. But 4×500MiB=1.95312GiB. That's an excellent argument for 1000-based units.
A: "Because drive manufactures chose to use a different way of counting to make you think the drive is bigger than it actually is."
Until hard-drive manufacturers started trying to deceptively market smaller drives with larger sounding base-ten capacities, everything in computer-land was base-2 and things were simple. Now I can't fully trust any number put before me without digging into the specs. How is this a step forwards?
Let's draw a parallel with physics. Draw an electrical circuit. The current is measured to be traveling to the right on a wire. Which way are the electrons going? Left. When current was discovered, it was assumed charge carriers were positive. They turned out to be negative (most of the time). That was a mistake, but did we fix it? Nope. It would have caused far more confusion than it was worth. If physics can stick with current flowing the wrong way for centuries for the sake of consistency, we can stick with base 2 hard drive capacities.
Then why does computing need a special case and redefine standard prefixes? A kilosomething is always 1000 something. (Unless it's reported by Microsoft Windows, apparently.)
> Until hard-drive manufacturers started trying to deceptively market smaller drives with larger sounding base-ten capacities, everything in computer-land was base-2 and things were simple.
Ah yes, like the IBM 53FD, a "1.2 MB" floppy that had 1200x1024 byte storage?
That was in 1977, by the way. All later floppies also didn't adhere to the arbitrary prefix redefinition. Nor did hard disks. Nor optical disks. Nor do USB sticks. Nor do bandwidths. Nor do SSDs. RAM incidentally does, but only because it naturally happens to have power-of-twos capacities. NVRAMs probably won't.
What kind of "consistency" are you trying to defend by pretending that kilo=1024?
> How is this a step forwards?
Once everyone unfucks themselves and we use binary IEC and metric SI prefixes like we should, everything will be less ambiguous than at any arbitrary point in the past 50 years.
This has never been true though. Processors have never been "kibihertz". Nor busses, nor modems. Just ram.
The board I'm building at the moment has ram specified as 10ns. That's 10x10^-9 seconds, not 10x2^-20 seconds. Which sounds silly to have to even say, but that's how I feel every time someone wants to pretend that just because something was commonly held when they were at school, does not mean it fits today's standards.
And harddrives, the one everyone obsessed about because their OS incorrectly shows them base2. No. No they weren't. The famous pictures of an IBM305 being lifted out of a plane. That's a 5Mb drive. It holds 5 million characters. 5,000,000 characters.
It's embarrassing enough that this fallacy is older than half our readers. It's even more embarrassing that it was never true in the first place.
Bandwidth speed is in bits. Storage is in bytes.
When the unit is anything-bytes, it is powers of 1024.
Note that the byte itself is a power of two already.
This obnoxious KiB MiB nonsense is a non-solution in search of a problem.
Wikipedia on MiB: The binary prefixes have been accepted by all major standards organizations and are part of the International System of Quantities. Many Linux distributions use the unit, but it is not widely acknowledged within the industry or media.
Basically, we only need this type of unit in a context where dishonest and ignorant people are involved, in order to protect the latter from the former.
Floppy disk storage is in byte, and doesn't use 1024.
CD storage is in byte, and doesn't use 1024.
DVD storage dito.
BluRay storage dito.
Hard disk storage dito.
SSD storage dito.
Memory card and USB stick storage dito.
The sole exception is RAM. (But not NVRAM. Because that would be silly.)
> a non-solution in search of a problem
My Apple II floppy drives had 256 byte sectors, groups of which were called a kilobyte.
Wikipedia quote: An MFM-based, "high-density" format, displayed as "HD" on the disks themselves and typically advertised as "1.44 MB" was introduced in 1987; the most common formatted capacity was 1,474,560 bytes.
(Why is it 1.44 MB? That's a bit of silliness: it's the result of dividing 1,474,560 by 1024, and then just moving the decimal point by three places instead of dividing by 1024 once again. The beginning of idiocy.)
All the storage examples you're giving are due to dishonest mass storage manufacturers who confused things sometime in the 1990's simply by lying.
All other storage uses 1024. A gigabyte of RAM is 512 megabytes times two.
RAM is not the "exception"; RAM is fundamental. A computer can operate without mass storage, but not without some RAM.
A good reason to use powers of two sizes whenever bytes are involved is that the byte itself is a power of two unit already: it has historically settled on 8 bits.
A kilobit of data already does not correspond to a power of ten byte value.
If you want metric byte storage sizes, then first redefine a byte as 10 bits. Then a bit can be a centibyte, a byte can be a decabyte. Ten megabits will be exactly a megabyte, and so forth.
I think this is the funniest rationale for using base-2 that I have come across. It makes no sense.
Memory works conveniently with base-2 because of the address lines. n address lines can select 2^n addresses. The size of the bytes stored at those addresses is completely irrelevant.
Your logic would suggest that US gallons, which contain exactly 256 tablespoons, should use binary units, as in "%1010 gallons of gas please" or "that tanker truck can carry 11.3 KGallons (where K equals 1,024)".
Base-2 definitely works most neatly for memory capacity, because memory chips are exactly a power of two, always. There's no need to invoke the (irrelevant) size of the byte.
IBM 53FD, 1977. Apparently the "idiocy" started really early…
> All other storage uses 1024.
What "all" other? Only RAM.
> A kilobit of data already does not correspond to a power of ten byte value.
That's not the problem of the SI prefixes, which are used for all units. IT does not operate in a vacuum.
> If you want metric byte storage sizes
Bullshit, nobody wants that. That's what IEC binary prefixes are for.
What, exactly, is the problem with using IEC binary prefixes for multiples of 1024?
Disk /dev/sdi: 60 GiB, 64424509440 bytes, 125829120 sectors
Units: sectors of 1 * 512 = 512 bytes
Disk /dev/sdi: 28.9 GiB, 31037849600 bytes, 60620800 sectors
Units: sectors of 1 * 512 = 512 bytes
Disk /dev/sdi: 7.2 GiB, 7743995904 bytes, 15124992 sectors
Units: sectors of 1 * 512 = 512 bytes
The 32 GB one is disturbingly off. 31 GB would have been more honest. If the spare sectors required go up more then they'll have to adjust their marketing numbers and maybe we will see 30 GB, 60 GB, or 120 GB flash drives sold.
USB sticks are predominantly sold in base-2 sizes (32, 64, 128 GB etc). In fact, I don't think I've ever seen a 30GB or 100GB usb disk.
How SSD sizes are advertised depends entirely on the manufacturer, and even manufacturers don't use a single base: Samsung's EVO series are advertised in multiples of 10 (250, 500GB), while its PRO series use sizes in multiples of 2 (256, 512GB). Sandisk uses 120 GB, 240 GB sizes. I'm not even sure if all those sizes are supposed to represent base-10 or base-2 prefixes.
Those links claim devices use n=4, 8, 16,... but apart from practicalities, I don't see a reason why n=3, 4, 5,... wouldn't work.
Also, these devices may get produced in power of two sizes units, but they all use error-correcting codes that turn their effective size in something non-power of two.
But in what sense are the flash drives base 2? Their capacity isn't. You have to justify your claims.
In the sense that the memory chips inside are base 2 capacity and the sole reason you are seeing less is because the flash control chip just reserves some space.
What's the point of pretending you don't understand someone's comment and asking for clarification?
And as a post scriptum, that manner of dividing people by "geeks" and "not geeks" is stupid and annoying by itself. It doesn't really matter how ignorant your user is, the reality won't bend to match his expectations: instead, when dealing with something new he would need to adapt his thinking to match reality, or just ignore any discrepancies that occur, if he doesn't care that much. When things can be more "user-friendly" — they should. If they can't because things are what they are — well, that's it.
No they aren't. Some things are naturally base-2 in computers. Far from all. Read the article.
Frequencies already are base 10. And while your HDD is not a tidy power of ten, it also isn't a tidy power of two. Being a huge multiple of 4-KB doesn't make it a power of two.
Your SSD isn't a tidy power of two either, for different reasons.
Computer specs are non-binary in a surprising number of ways.
I already did, and already commented that this is ridiculous nonsense.
> Frequencies already are base 10
I guess I explicitly mentioned it above, but seems that I have to repeat myself. Frequencies are base nothing. "Frequency" is just how many times something happens in a given period of time. Period of time may be anything: a minute, which doesn't contain 100 seconds, as you are surely aware; a time light in vacuum travels exactly 1 foot, which also wouldn't be a power of 10 compared to a second — anything. How many ticks your processor makes during this time is completely arbitrary as well: you can make it 1 tick more or 77.5 ticks less, it won't care. The reason why you think it's "base 10" is just that it's expressed in Hz, as Hz is an unit of SI, which uses powers of ten for everything. Which makes sense when we are talking about physics. You can "make" the frequency of your processor "base-something-else" without even physically changing anything: just use your own period of time instead of 1 second.
How many bytes do you have on your hard drive is not so much about physics, however. You could construct your own storage device with completely arbitrary number of storage cells with completely arbitrary number of states each — that's not the important part. I think you understand that it's absolutely trivial thing to do: in a sense, you can invent a storage device with such a property yourself in a few hours — or even minutes, if you are fine with making that storage device "virtual". What is the important part is that "storage capacity" of your device means absolutely nothing before it's plugged into your computer and used with existing software, which runs on processors that are base-2, using filesystems with address space that are base-2. It's not about the hardware, it's about how we treat it in our software, which is inherently base-2. Calculating sizes for that in base-3 or base-10 is not impossible, of course, but it is just alien. It adds complexity when the whole point of whining in that blogpost is to reduce the complexity. The only way to actually reduce it here is to acknowledge that the things are what they are: even if your HDD has capacity of exactly 3333 bits, your data is in some sense "base-2", even if not literally. Even if your HDD doesn't care "base-what" it is — your RAM and your processor do. So it makes more sense to not use 2 standards and just measure everything information-capacity-related in powers of 2. And if size of your file is expressed in MiB: your HDD should be too, even if it's a fractional number of MiBs on that.
I am actually a traditionalist, for me, bytes matter, as to megabytes and gigabytes, not forgetting terabytes. I like the consistency of Base 2, Base 10 for disk sizes always makes me feel short changed. But I am on HN, I cut my teeth on 6502 and I know my two times table. But my uncle, with his top of the range iPhone whatever? He takes lots of photos and has a huge iTunes collection, for him 'large' might be the thing that informs his choice. My auntie? She doesn't do music but she plays those Angry Birds games, 'medium' will do nicely for her. My nephew? Small should do him fine (he always loses his phone so never has much on it).
Of course people in-between 'developer' and 'techno-phobic' might need a little guidance in choosing size, but right now these megabyte and gigabyte things are just plain confusing. 'Best for photos - medium' might be the strap line to sell the product.
Of course, like clothing sizes could be on a per-brand basis. Much like how you can be a 'Large' in Nike jacket sizes, you could be an 'X-Large' in some equivalent Italian leisure-sports-wear brand. In the world of gadgets you could therefore be 'Large' in Apple device-land and 'X-Large' in Samsung gadget-land. Even bicycles that were once measured in distance from bottom-bracket to top-tube are now simply sold as 'Small/Medium/Large'.
This simple sizing could make things a lot simpler for all involved, side-stepping this Base 2/10 nonsense.
Base 2 prefixes make sense for memory because memory chips have a power-of-two capacity. Base 2 prefixes makes sense for address space because n bits can identify 2^n different addresses. Page sizes are base 2 because it allows for easy bit masking to select the page number and the address within the page. Bit masking is, in fact, one of the main advantages of base 2. So yeah, base 2 has its place.
We can make hard drives that have a base 2 capacity. We can make flash drives with that capacity. The reality is that we can pretty much make them any capacity we want. The fact that it's convenient to calculate with base 2 capacities on a machine where it is most convenient to calculate with base 2 numbers comes as no surprise and is equally valid to any form of memory or storage medium.
That doesn't make memory "naturally base 2", of course. memcpy doesn't stop working when I specify a size that isn't a multiple of sizeof(int). We could work with non-base 2 memory capacity perfectly well while losing none of the performance.
So the author here ends up trying to make post hoc rationalizations for why some things ended up base 2 and other base 10 when really for any amounts the benefits he ascribes to base 2 memory apply.
When allocating buffers and using memcpy there is (modulo cache effects) no reason to prefer base 2. char buf works about as well as char buf.
> We can make hard drives that have a base 2 capacity.
Sure, but it's a needless constraint. Whereas I'm not aware of anybody ever making a RAM chip that isn't a base 2 capacity.
I think this is a fundamental misconception. How they're made is irrelevant except for why they're made that way. And that's because of memory addressing: a memory address is signaled by using a number of parallel bit lanes. The power of 2 comes from the fact that every lane has two states. The same goes for aggregating multiple addresses (like page size and cache size): the most efficient (and therefore only viable) way to aggregate blocks of contiguous memory is to mask out the least significant bits of the address, which is why you also get base-2 multiples for memory aggregation.
Therefore, the fundamental expression of memory capacity is the number of usable address lines multiplied by the minimum addressable unit (byte for RAM, 512 bytes for old hdd's, 2048 bytes for cdrom, 4096 bytes for new-format disks).
As for hard disks, they use the same method of adressing as memory does! In fact, the fundamental addressable unit already is a multiple of 2, not 10, as I mentioned above. The main difference between volatile and non-volatile memory is that the number of address lines is set by standard (it's still 48 I believe) so the available capacity must be probed by some other way. And as hard disks are slow anyway, manufacturers can get away with adding a new layer of indirection (firmware) on their disks which hides the address implementation.
But in my view, there is no valid technical reason for storage sizes to be expressed in multiples of 10.
The drive they discuss has 6 read/write heads, 6,810 cylinders, and 122 to 232 sectors per track (presumably more on the outer tracks).
None of these numbers are the slightest bit base-2-ish, which is why capacities are rarely near powers of 2. So, there is no valid technical reason for storage sizes to be expressed in powers of 2.
Hard disks aren't inherently divided in cylinders, sectors and plates. Those come in several different distribution, without the power of 2 bias. Flash memory comes in powers of 2, but flash drivers reserve some of it, thus flash drivers are inherently a little bit smaller than a power of 2.
Gallons are more binary than hard drives (eight ninths of the factors are two) so my logic is unassailable.