
Base Ten for Almost Everything - ingve
https://randomascii.wordpress.com/2016/02/13/base-ten-for-almost-everything/
======
Avernar
I personally prefer base 2 units for file size. A 2GB file will take 2GB of
memory when I load it in. Hard drive page sizes are also base two, 512 bytes
for most drives and 4096 bytes (4K) for newer large capacity drives.

I also prefer base 2 for file transfer speeds just so it matches the file size
convention. Fortunately most programs either default to base 2 or let you
chose in the preferences.

Network line speeds are in base 10 (100Mbit ethernet is actually 100,000,000
bits/s) but there so much protocol overhead and other factors that comparing
transfer speeds and line speeds is moot in most situations.

~~~
brucedawson
You may be the first commenter who has actually given a _reason_ for
preferring base 2 units for file sizes. I'm not sure it justifies exposing
base-2 sizes to non-geek consumers, but it is a valid reason.

~~~
mehrdada
I'm not sure why "non-geek customers" should care about files or file sizes
anymore.

~~~
zeta0134
They really shouldn't; I work in IT, and typically describe file sizes to my
clients using the metric prefixes; if they're curious I'll explain the
difference (1000 is actually 1024 at each level) but the difference is usually
negligible, and modern operating systems round the difference away when
displaying the simplified sizes to the user.

------
DominikD
The only real problem is when 10-base prefixes are used to represent 2^10-base
values (or vice versa, if that happens). Everything else is just preference,
as quite clearly evident here:

"even when dealing with non-technical customers"

Bruce, your implicit assumption is that non-technical customers actually care
about units in absolute terms. They don't, that's why some devices (e.g.
Nintendo consoles) deal with "blocks" instead of megabytes or mebibytes. Non-
technical users care about relative terms. Whether these are represented in 2-
or 10-based units, ferrets or chimichangas is irrelevant. My mother wants to
know whether data on her 2GB pendrive can be stored on HDD with 20GB space.
2<20, so the answer is yes, case closed.

OS (or ecosystem in a broader sense) should be consistent, that's all there is
to it (and since it isn't and won't be, it should at least be honest). Whether
1000 or 1024 "makes more sense", to "some or all" parts of the computer
"visible or not" is entirely a matter of taste. "Makes sense" is as
meaningless metric as "is natural". Everyone can (and will) twist it any way
they like by providing various examples that adhere to their preference.

~~~
brucedawson
It is true that non-technical customers often don't care. Although, they do
care when they buy a 1 TB drive and get told by Windows (but not OSX) that it
is only 916 GB.

Imagine a customer with a 7.7 GB file (as reported by Windows) who purchase an
8 GB thumb drive to hold it. The inconsistent units mean that it won't fit. We
should make our units consistent and we should use base-10 because _people_.

~~~
paulddraper
Your argument is good for showing the value of consisent units.

But it could be base 2 or base 16.

------
tallanvor
Q: "Why does my computer list the size of my drive as being smaller than what
the box says it is?"

A: "Because drive manufactures chose to use a different way of counting to
make you think the drive is bigger than it actually is."

Problem solved.

------
beloch
Consistency trumps correctness. If some computer specs are in binary and not
others, the result will be confusion because most people won't know what base
is being used at any given moment. There will be far less confusion if we pick
one base and stick to it.

Until hard-drive manufacturers started trying to deceptively market smaller
drives with larger sounding base-ten capacities, everything in computer-land
was base-2 and things were simple. Now I can't fully trust any number put
before me without digging into the specs. How is this a step forwards?

Let's draw a parallel with physics. Draw an electrical circuit. The current is
measured to be traveling to the right on a wire. Which way are the electrons
going? Left. When current was discovered, it was assumed charge carriers were
positive. They turned out to be negative ( _most_ of the time). That was a
mistake, but did we fix it? Nope. It would have caused far more confusion than
it was worth. If physics can stick with current flowing the wrong way for
centuries for the sake of consistency, we can stick with base 2 hard drive
capacities.

~~~
Flimm
I agree that consistency is critical. It was always confusing that "kilo"
meant 1000 everwhere else but for some reason it meant 1024 when it came to
storage sizes (but not bandwidth speeds). To eliminate ambiguity in one fell
swoop we should use the KiB MiB units.

~~~
kazinator
It was never confusing.

Bandwidth speed is in bits. Storage is in bytes.

When the unit is anything-bytes, it is powers of 1024.

 _Note that the byte itself is a power of two already._

This obnoxious KiB MiB nonsense is a non-solution in search of a problem.

Wikipedia on MiB: _The binary prefixes have been accepted by all major
standards organizations and are part of the International System of
Quantities. Many Linux distributions use the unit, but it is not widely
acknowledged within the industry or media._

Basically, we only need this type of unit in a context where dishonest and
ignorant people are involved, in order to protect the latter from the former.

~~~
creshal
> Storage is in bytes.

Floppy disk storage is in byte, and doesn't use 1024.

CD storage is in byte, and doesn't use 1024.

DVD storage dito.

BluRay storage dito.

Hard disk storage dito.

SSD storage dito.

Memory card and USB stick storage dito.

The sole exception is RAM. (But not NVRAM. Because that would be silly.)

> _a non-solution in search of a problem_

Really now.

~~~
kazinator
Which floppy disk?

My Apple II floppy drives had 256 byte sectors, groups of which were called a
kilobyte.

Wikipedia quote: _An MFM-based, "high-density" format, displayed as "HD" on
the disks themselves and typically advertised as "1.44 MB" was introduced in
1987; the most common formatted capacity was 1,474,560 bytes._

(Why is it 1.44 MB? That's a bit of silliness: it's the result of dividing
1,474,560 by 1024, and then just moving the decimal point by three places
instead of dividing by 1024 once again. The beginning of idiocy.)

All the storage examples you're giving are due to dishonest mass storage
manufacturers who confused things sometime in the 1990's simply by lying.

All other storage uses 1024. A gigabyte of RAM is 512 megabytes times two.

RAM is not the "exception"; RAM is fundamental. A computer can operate without
mass storage, but not without some RAM.

A good reason to use powers of two sizes whenever bytes are involved is that
_the byte itself is a power of two unit already_ : it has historically settled
on 8 bits.

A kilobit of data already does not correspond to a power of ten byte value.

If you want metric byte storage sizes, then first redefine a byte as 10 bits.
Then a bit can be a centibyte, a byte can be a decabyte. Ten megabits will be
exactly a megabyte, and so forth.

~~~
creshal
> Which floppy disk?

IBM 53FD, 1977. Apparently the "idiocy" started really early…

> All other storage uses 1024.

What "all" other? Only RAM.

> A kilobit of data already does not correspond to a power of ten byte value.

That's not the problem of the SI prefixes, which are used for _all_ units. IT
does not operate in a vacuum.

> If you want metric byte storage sizes

Bullshit, nobody wants that. That's what IEC binary prefixes are for.

~~~
kazinator
What quantity has two SI units such that one is 8 times the other, and are
kilo-, mega-, etc used for both?

~~~
creshal
Neither bit nor byte are SI units.

What, exactly, is the problem with using IEC binary prefixes for multiples of
1024?

------
Avernar
Flash drives are base 2. The memory chips inside are base 2 capacity. The
flash control chip just reserves some space for wear leveling and other
internal functions.

~~~
brucedawson
The memory chips are base 2 capacity, yes.

But in what sense are the flash drives base 2? Their capacity isn't. You have
to justify your claims.

~~~
qb45
> But in what sense are the flash drives base 2?

In the sense that _the memory chips inside are base 2 capacity_ and the sole
reason you are seeing less is because _the flash control chip just reserves
some space_.

What's the point of pretending you don't understand someone's comment and
asking for clarification?

------
IshKebab
Yeah well maybe if someone suggested a _reasonable_ base-2 prefix we could use
that. Nobody is going to say "mebibyte" or "gibibyte" though. It sounds too
ridiculous.

------
krick
That's just outrageous nonsense. I would agree if the argument was about
nomenclature: it should be GiB instead of GB, yes — but things _are_
"naturally base-2" in computers. Yes, you can make an SSD base-10 or whatever,
the same way as a frequency really could be π^10 Hz or whatever, but your
hardware without a software is nothing but a pile of garbage, and your
software has really no choice but to deal with stuff that is base-2. RAM is
not base-10 and registers are not base-10, the address space for anything is
not base 10. Your HDD (or SSD, or some virtual block-device) will have a
filesystem on it, which is divided by blocks that are not base-10. In fact,
the very flash memory of yours has pages that are 2 or 4 KiB, and not 4000 B.
If anything, disk couldn't be "base-10" anyway, as bytes (or, if we ditch them
as well) bits are base-2. Even if you'd make a device that would be perfectly
base-10 in it's storage capacity, addressing it then would be a problem.

And as a post scriptum, that manner of dividing people by "geeks" and "not
geeks" is stupid and annoying by itself. It doesn't really matter how ignorant
your user is, the reality won't bend to match his expectations: instead, when
dealing with something new he would need to adapt his thinking to match
reality, or just ignore any discrepancies that occur, if he doesn't care that
much. When things can be more "user-friendly" — they should. If they can't
because things are what they are — well, that's it.

~~~
brucedawson
> things are "naturally base-2" in computers

No they aren't. _Some_ things are naturally base-2 in computers. Far from all.
Read the article.

Frequencies already _are_ base 10. And while your HDD is not a tidy power of
ten, it also isn't a tidy power of two. Being a huge multiple of 4-KB doesn't
make it a power of two.

Your SSD isn't a tidy power of two either, for different reasons.

Computer specs are non-binary in a surprising number of ways.

~~~
krick
> Read the article.

I already did, and already commented that this is ridiculous nonsense.

> Frequencies already are base 10

I guess I explicitly mentioned it above, but seems that I have to repeat
myself. Frequencies are base _nothing_. "Frequency" is just how many times
something happens in a given period of time. Period of time may be anything: a
minute, which doesn't contain 100 seconds, as you are surely aware; a time
light in vacuum travels exactly 1 foot, which also wouldn't be a power of 10
compared to a second — anything. How many ticks your processor makes during
this time is completely arbitrary as well: you can make it 1 tick more or 77.5
ticks less, it won't care. The reason why you _think_ it's "base 10" is just
that it's expressed in Hz, as Hz is an unit of SI, which uses powers of ten
for everything. Which makes sense when we are talking about physics. You can
"make" the frequency of your processor "base-something-else" without even
physically changing anything: just use your own period of time instead of 1
second.

How many bytes do you have on your hard drive is not so much about physics,
however. You could construct your own storage device with completely arbitrary
number of storage cells with completely arbitrary number of states each —
that's not the important part. I think you understand that it's absolutely
trivial thing to do: in a sense, you can invent a storage device with such a
property yourself in a few hours — or even minutes, if you are fine with
making that storage device "virtual". What _is_ the important part is that
"storage capacity" of your device means absolutely nothing before it's plugged
into your computer and used with existing software, which runs on processors
that are base-2, using filesystems with address space that are base-2. It's
not about the hardware, it's about how we treat it in our software, which is
inherently base-2. Calculating sizes for that in base-3 or base-10 is not
impossible, of course, but it is just _alien_. It _adds_ complexity when the
whole point of whining in that blogpost is to _reduce_ the complexity. The
only way to actually reduce it here is to acknowledge that the things are what
they are: even if your HDD has capacity of exactly 3333 bits, your _data_ is
in some sense "base-2", even if not literally. Even if your HDD doesn't care
"base-what" it is — your RAM and your processor do. So it makes more sense to
not use 2 standards and just measure everything information-capacity-related
in powers of 2. And if size of your file is expressed in MiB: your HDD should
be too, even if it's a fractional number of MiBs on that.

------
Theodores
Small, medium and large, that might do it, maybe with X-Large and XX-Large for
the American market. Works for pants and plenty of fast moving consumer goods.
Cuts out having to have detailed specs, plus it is not exactly as if a
computer is something that has to be sized to a person's anatomy. The same
could be applied to screen sizes, instead of clumsy acronyms like 'CGA',
'SVGA' etc.

I am actually a traditionalist, for me, bytes matter, as to megabytes and
gigabytes, not forgetting terabytes. I like the consistency of Base 2, Base 10
for disk sizes always makes me feel short changed. But I am on HN, I cut my
teeth on 6502 and I know my two times table. But my uncle, with his top of the
range iPhone whatever? He takes lots of photos and has a huge iTunes
collection, for him 'large' might be the thing that informs his choice. My
auntie? She doesn't do music but she plays those Angry Birds games, 'medium'
will do nicely for her. My nephew? Small should do him fine (he always loses
his phone so never has much on it).

Of course people in-between 'developer' and 'techno-phobic' might need a
little guidance in choosing size, but right now these megabyte and gigabyte
things are just plain confusing. 'Best for photos - medium' might be the strap
line to sell the product.

Of course, like clothing sizes could be on a per-brand basis. Much like how
you can be a 'Large' in Nike jacket sizes, you could be an 'X-Large' in some
equivalent Italian leisure-sports-wear brand. In the world of gadgets you
could therefore be 'Large' in Apple device-land and 'X-Large' in Samsung
gadget-land. Even bicycles that were once measured in distance from bottom-
bracket to top-tube are now simply sold as 'Small/Medium/Large'.

This simple sizing could make things a lot simpler for all involved, side-
stepping this Base 2/10 nonsense.

~~~
TeMPOraL
I read your post as a huge piece of sarcasm, but the sad reality is that with
sales/marketing people ru(i)ning the asylum, the tech world is really turning
into what you've described.

------
Animats
Mandatory XKCD: [https://xkcd.com/394](https://xkcd.com/394)

------
wmf
While we're at it, why are networks rated in bits per second but network
transfers in bytes per second?

~~~
Flimm
Fun fact: 1kbit/s is always 1000 bits per second and never 1024 bits per
second.

~~~
rasz_pl
fun fact: and this is why you need to divide by 8 to get theoretical speed, or
use mental shortcut of dividing by 10 to approximate protocol overhead.

------
revelation
The explanation that "base 2 is natural for memory" makes absolutely no sense.

 _Base 2 prefixes make sense for memory because memory chips have a power-of-
two capacity. Base 2 prefixes makes sense for address space because n bits can
identify 2^n different addresses. Page sizes are base 2 because it allows for
easy bit masking to select the page number and the address within the page.
Bit masking is, in fact, one of the main advantages of base 2. So yeah, base 2
has its place._

We can make hard drives that have a base 2 capacity. We can make flash drives
with that capacity. The reality is that we can pretty much make them any
capacity we want. The fact that it's convenient to calculate with base 2
capacities on a machine where it is most convenient to calculate with base 2
numbers comes as no surprise and is equally valid to any form of memory or
storage medium.

That doesn't make memory "naturally base 2", of course. memcpy doesn't stop
working when I specify a size that isn't a multiple of sizeof(int). We could
work with non-base 2 memory capacity perfectly well while losing none of the
performance.

So the author here ends up trying to make post hoc rationalizations for why
some things ended up base 2 and other base 10 when really for any amounts the
benefits he ascribes to base 2 memory apply.

~~~
brucedawson
You're reading more into that sentence than I intended. I meant that base 2 is
natural for memory capacity because that is how chips are made, so it makes
things work more neatly. It lets us say "8 GB of RAM" instead of "8.59 GB".
All RAM chips have a power-of-two capacity and while that might be multiplied
by a small non-power-of-two (sometimes three) when multiple chips are put in a
system, powers-of-two make talking about memory easier. Ditto for 4-K pages,
2-MB pages, 64-KB caches, etc.

When allocating buffers and using memcpy there is (modulo cache effects) no
reason to prefer base 2. char buf[1000] works about as well as char buf[1024].

> We can make hard drives that have a base 2 capacity.

Sure, but it's a needless constraint. Whereas I'm not aware of anybody _ever_
making a RAM chip that isn't a base 2 capacity.

~~~
tremon
_base 2 is natural for memory capacity because that is how chips are made_

I think this is a fundamental misconception. How they're made is irrelevant
except for _why_ they're made that way. And that's because of memory
addressing: a memory address is signaled by using a number of parallel bit
lanes. The power of 2 comes from the fact that every lane has two states. The
same goes for aggregating multiple addresses (like page size and cache size):
the most efficient (and therefore only viable) way to aggregate blocks of
contiguous memory is to mask out the least significant bits of the address,
which is why you also get base-2 multiples for memory aggregation.

Therefore, the fundamental expression of memory capacity is the number of
usable address lines multiplied by the minimum addressable unit (byte for RAM,
512 bytes for old hdd's, 2048 bytes for cdrom, 4096 bytes for new-format
disks).

As for hard disks, they use the same method of adressing as memory does! In
fact, the fundamental addressable unit already is a multiple of 2, not 10, as
I mentioned above. The main difference between volatile and non-volatile
memory is that the number of address lines is set by standard (it's still 48 I
believe) so the available capacity must be probed by some other way. And as
hard disks are slow anyway, manufacturers can get away with adding a new layer
of indirection (firmware) on their disks which hides the address
implementation.

But in my view, there is no valid technical reason for storage sizes to be
expressed in multiples of 10.

~~~
brucedawson
A memory chip has a capacity that is an exact power of two. Hard drives - not
so much. Check out this reference:

[http://www.pcguide.com/ref/hdd/geom/geomLogical-c.html](http://www.pcguide.com/ref/hdd/geom/geomLogical-c.html)

The drive they discuss has 6 read/write heads, 6,810 cylinders, and 122 to 232
sectors per track (presumably more on the outer tracks).

None of these numbers are the slightest bit base-2-ish, which is why
capacities are rarely near powers of 2. So, there is no valid technical reason
for storage sizes to be expressed in powers of 2.

------
brucedawson
Idea: since gallons are inherently binary (there are 3*2^8 teaspoons in a
gallon) we should use Ki for them. Instead of "I put twenty gallons of gas in
my car" we can say "I put 15 KiTeaspoons of gas in my car".

Gallons are more binary than hard drives (eight ninths of the factors are two)
so my logic is unassailable.

------
liw
I am most upset at this continued insistence of using a Bel as unit of data.

