Hacker News new | comments | show | ask | jobs | submit login
OpenBSD removes support for non-UTF8 locales (marc.info)
236 points by ingve on Aug 14, 2015 | hide | past | web | favorite | 180 comments

I wonder what the pros and cons weighed in the discussion were.

Clearly not supporting Unicode text in non-UTF-8 locales (except through, like, some kind of compatibility function, like recode or iconv) is the Right Thing. One problem that I have is that current UTF-8 implementations typically are not "8 bit clean", in the sense that GNU and modern Unix tools typically attempt to be; they crash, usually by throwing an exception, if you feed them certain data, or worse, they silently corrupt it.

Markus Kuhn suggested "UTF-8B" as a solution to this problem some years ago. Quoting Eric Tiedemann's libutf8b blurb, "utf-8b is a mapping from byte streams to unicode codepoint streams that provides an exceptionally clean handling of garbage (i.e., non-utf-8) bytes (i.e., bytes that are not part of a utf-8 encoding) in the input stream. They are mapped to 256 different, guaranteed undefined, unicode codepoints." Eric's dead, but you can still get libutf8b from http://hyperreal.org/~est/libutf8b/.

I'm willing to bet a large amount that non UTF-8 encoding were broken and nobody cared enough to bother fixing them.

OpenBSD does not hesitate to nuke legacy stuff that gets broken. Which i feel is ultimately for the best, because half-assed support that barely functions is worse than no support at all many times.

It was in fact intentionally broken to find out where removing single-byte locales hurts our users most.

We have a hackathon coming up with devs committed to making UTF-8 work in more base utilities. If that works out, and the most sore points of latin1/koi-8/etc users have been adequately addressed, 5.9 will ship with only the UTF-8 locale (and of course the default "C" locale -- ASCII).

If this approach turns out to be wrong because we cannot get regressions fixed, 5.9 will ship like 5.7 and 5.8 (with UTF-8 and single byte locales).

My first thought was, what about the "C" locale so good to see that question already answered.

I really wish there was some sort of standard "U" locale that would be the same as "C" but UTF-8, and ISO rather than US format dates.

That locale pseudo-exists. It's called "don't call the evil setlocale function, write in C90 as much as possible, do your own UTF-8 encoding and decoding, and implement the exact default date format you want with your own strftime string or whatever."

That doesn't exactly help me as a user, and possibly makes things worse as some things respect locale and some don't.

There has been some talk both in glibc and musl of shipping such a "C-but-UTF-8" locale.

Oh, I didn't realize you weren't removing "C"! Thank you for explaining!

If I had to guess, using my mental model of OpenBSD:

(a) most non-UTF-8-or-UTF-16 locales will choke (crash or corrupt data) in the rare case that they try to encode text outside their encoding range (the mirror image of the problem UTF-8B fixes in UTF-8);

(b) codecs have to be fast and handle untrusted strings of somewhat unpredictable lengths, making them a likely source of security holes;

(c) possible subtle bugs in a codec enable "cloaking attacks" where different parts of a system parse the same string differently; these have existed in the past with UTF-8, but would have to be rooted out of every codec;

(d) encoding text with one codec and decoding it with another also corrupts it.

So there are lots of good reasons to require the system to default to UTF-8 and use other codecs only in special cases involving backwards compatibility.

I hope you can still get reasonable performance and sensible ordering by setting LC_COLLATE=C.

Having sat in on a BUG meeting where this was discussed by one of the devs responsible, I believe it was basically "UTF-8 won, it's time to not pretend otherwise, we're going to move forward with this."

For the benefit of others (the link is nonobvious), here's Markus Kuhn's presentation of UTF-8B:


The tl;dr is to map an invalid UTF-8 byte n to code point U+DC00 + n, which puts it in the code point range reserved for the second part of a surrogate pair. (In UTF-16, a 16-bit value between D800 and DBFF followed by a 16-bit value between DC00 and DFFF is used to encode a code point that cannot fit in 16 bits. Since these "surrogate pairs" happen only in that order, there is room to extend UTF-16 by assigning a meaning to a DC00-DFFF value seen without a D800-DBFF before it.) Since the surrogate code points are defined as not "Unicode scalar values" and cannot exist in well-formed "Unicode text", and therefore cannot be decoded from well-formed UTF-8, there's no risk of confusion.

There are some similarities with the extension of UTF-8 encoding that is sometimes called "WTF-8" https://simonsapin.github.io/wtf-8/. WTF-8 lets unchecked purportedly-UTF-16 data be parsed as a sequence of code points, encoded into an extension of UTF-8, and round-tripped back into the original array of uint16s. UTF-8B lets unchecked purportedly-UTF-8 data be parsed as a sequence of code points, encoded into an extension of UTF-16, and round-tripped back into the original array of uint8s. They're not quite compatible, because WTF-8 would encode U+DC80 as a three-byte sequence (ED B2 80), and UTF-8B would decode that into three code points (U+DCED U+DCB2 U+DC80) since U+DC80 isn't a Unicode scalar value. But if a system wanted to support both of these robust encodings simultaneously, I think you could handle this fairly clear special case.

Agreed, except that UTF-8B also lets you round trip UTF-8B → UCS-4 → UTF-8B safely, not just via UTF-16.

Kuhn's idea is also used in in Python 3, so that garbage bytes can (optionally!) be decoded to Unicode strings and later losslessly turned back into the same bytes, which ensures (e.g.) that filenames that can't be decoded can still be used: https://www.python.org/dev/peps/pep-0383/

Interesting; I implemented exactly the same thing in the TXR language. I can read an arbitrary file in /bin/ as UTF-8 to a string, and when that string is converted to UTF-8, it reproduces that file exactly. All invalid bytes go to DCXX, including the null character. The code U+DC00 is called "pnul" (pseudo-null) and can even be written like #\pnul in the language as a character constant. Thanks to pnul, you can easily manipulate data that contains nulls, like /proc/<pid>/environ, or null-delimited strings from GNU "xargs -0". The underlying C strings are nicely null-terminated with the real U+0000 NUL, and everything is cool.

You and the Python guys should get together and make your hack compatible, and then pressure everyone else to standardize on it, instead of the horrible nightmare where every string input and output operation potentially corrupts data or crashes.

Same in haskell (ghc haskell at least).

It was sort of darkly funny to be reading along as you're quoting the guy then all of a sudden hit so matter-of-factly, "He's dead, but you can still get the thing from...." A real splash of cold water.

What would be great would be if someone would take up UTF-8B again. I mentioned his death because otherwise you might think he lost interest in the project, but no, he lost interest in living.

> he lost interest in living

I get what you're trying to accomplish with the parallel construction, but that's a pretty callous way to describe it :/

I'm sorry I upset you. I didn't mean to.

>I'm sorry (if) I upset you. I didn't mean to.

The poster didn't express that he was upset. He expressed an opinion that your description was callous.

I wasn't upset, and I certainly wasn't trying to imply I was. But that doesn't change that it makes me a little sad to see people refer to suicide with such a lack of empathy.

I am surprised that you read my remark as lacking empathy, but I have deleted the remarks I had written here about my actual feelings about the event, because this doesn't seem like a very promising conversation.

He died from a massive drug overdose right?

Yes, although we didn't find out for sure until several months later. It was too massive to be an accident.

In a roundabout way, this is because I wasn't able to push through an isprint() workaround diff to ls. http://marc.info/?l=openbsd-misc&m=142540203528315&w=2

Reminds me of Go strings: they usually store UTF-8 but they're actually 8-bit clean:

"It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes."


It's only 8-bit clean if you don't poke it very hard. Try either of the last two loop examples in that page after adding "\xff\x80" to the string; you get two (indistinguishable) U+FFFD REPLACEMENT CHARACTERs in the iteration. So the loop destroys data, which UTF-8B specifically does not.

Also, it's a little disappointing that Go doesn't have a type-level way to say that a string is in fact UTF-8, not Latin-1 or something, and preferably that all values that inhabit that type are guaranteed to be valid and well-formed UTF-8. This is the cause of plenty of subtle bugs in Python 2, C, etc., which are all technically the result of programmer error, but in this decade, type systems should be helping us avoid common, subtle programmer errors.

If you're juggling a bunch of different types of strings and want to keep them straight, Go does support defining separate types for them. An example is the HTML type from the template library [1].

The question is which sanitized string types are worth defining in the standard library. Presumably UTF-8 sanitized strings didn't make the cut.

Not sure about UTF-8B. Suppose the input is already UTF-8B? Do you double-escape it somehow?

It looks like DecodeRuneInString returns RuneError if it can't decode something and RuneError is defined as U+FFFD. The example uses a hard-coded string where it can't happen, so technically it's not a bug that it doesn't check for the error. But a linter might want to flag it.

[1] http://golang.org/pkg/html/template/#HTML

> One problem that I have is that current UTF-8 implementations typically are not "8 bit clean", in the sense that GNU and modern Unix tools typically attempt to be; they crash, usually by throwing an exception

Crashing on invalid data sounds like a great idea. Leaving garbage through doesn't.

Crashing in the "/* oops! */ exit(1)" sense is great. Crashing in the buffer overflow sense is not. OpenBSD treats all of the latter as potential security vulnerabilities.

No objection there, "crashing" in my comment was meant as "stops processing and report an input data error to the user", the erlang sense of crashing if you will.

Aborting might be the better term.

Is it really garbage? If we want to be true to UNIX's (questionable) "Write programs to handle text streams, because that is a universal interface" ethos, our definition of "text" has to admit all possible byte strings to be "universal". And the so-called C locale historically did.

> Is it really garbage?

Invalid UTF-8 when valid UTF-8 was expected? Yes.

> "Write programs to handle text streams, because that is a universal interface" ethos, our definition of "text" has to admit all possible byte strings to be "universal"

Random bytes are not text. The Unix ethos is "communicate via arbitrary binary streams" but programs which only understand text understand text, not random-bytes-which-are-not-text. It seems sensible for programs to have more restrictions on their input than the general-purpose communication protocol does: would you expect jq to try and process input which is not JSON in any way, shape or form despite being it being billed a JSON processor? Because that's not what it's going to do, at least not by default.

Although it was pretty common up to the mid-1990s to run into 8-bit-cleanness problems and arbitrary buffer sizes, IMHO, the Unix ethos is not for `wc`, `tr`, `dd`, `sort`, `uniq`, `read`, `diff`, `patch`, or `split` to crash with certain input data, to silently corrupt that data, or to spew warning messages about its contents. They are building blocks for your programs; it is not their business to impose unnecessary expectations on your data. They can and should correctly handle arbitrary data. When they don't do that, they limit the programs you can write with them, and with no compensating increase in anything other virtue.

You're the reason I can't use an "'" in my password on Citibank's web site, aren't you? I finally caught you, you bastard.

As a French-speaking person, I cannot tell you how much the announcement[0] that after 5.8, basic utilities, including mg(1), will be UTF-8 ready pleases me. I'm a huge Emacs fan, but I like to use mg(1) for quick edits and this is very exciting news for me!

[0] http://undeadly.org/cgi?action=article&sid=20150722182236

AIUI, one of the big problems with using UTF-8 universally is that it's rather unfriendly to Asian character sets. e.g. apparently UTF-8 is three times bigger than TIS-620 for Thai characters (from http://www.micro-isv.asia/2009/03/why-not-use-utf-8-for-ever...).

I spoke to several Japanese OpenBSD developers about this. They told me UTF-8 is not perfect but it's the least of available evils. They'd be more happy with a UTF-8 capable base system than with the current state of things where with few exceptions UTF-8 is supported only by 3rd party applications installed from ports.

And note that this is about system locales which mostly concerns libc APIs. Applications are still free to support additional character sets via other means (e.g. iconv).

For some problems a locale may not be the best answer. For instance, during these conversations I learned that Japanese android phones expose filenames as Shift-JIS which cannot be listed by ls(1) when the phone's filesystem is mounted in OpenBSD. In my opinion what's needed is not a system locale that switches everything to Shift-JIS but a translation layer which presents filenames as UTF-8 to the rest of the system. Perhaps a fuse filesystem module which links to libiconv in userspace to perform the necessary translation, and presents the result at an auxiliary mount point.

Funny thing about Emacs and OpenBSD...

...Emacs is the only package in the entire ports tree that can't use ASLR.

That can't possibly be true, the Go port also doesn't use ASLR.

I'm just quoting what an OpenBSD dev mentioned to me. I guess there could be more.

Why is it?

Not sure, but I can tell you that Emacs is weird inside, in a lot of ways. It runs an interpreter for a language that nobody else uses, which has been heavily optimized because otherwise Emacs would be too slow. Then they use tricks to speed up executable loading, like loading Emacs and then writing the contents of memory to disk, so it can be loaded more quickly the next time.

Yes, from what I hear, basically it's Emacs' issue.

I dream of a world where everything is UTC, UTF-8, and metric.

UTC is (as everyone knows) a bit problematic due to leap seconds. Different software systems handle the leap seconds somewhat differently. Handling leap seconds is is actually quite difficult if you want to get it absolutely correct. In 99% of cases the problems are just ignored (e.g. "it doesn't matter if the chart is slightly odd looking when you look at the moment of the leap second").

There's also the problem that every software using UTC must be updated at least once every six months. That may be a lesser problem these days, but is still somewhat relevant especially in various industries.

I'd probably go with TAI and just convert the dates to the "human readable" format in the UI. Of course, that's not trivial either.

For those who don't know about TAI:


International Atomic Time (TAI, from the French name Temps Atomique International[1]) is a high-precision atomic coordinate time standard based on the notional passage of proper time on Earth's geoid.[2] It is the basis for Coordinated Universal Time (UTC), which is used for civil timekeeping all over the Earth's surface, and for Terrestrial Time, which is used for astronomical calculations. As of 30 June 2015 when the last leap second was added,[3] TAI is exactly 36 seconds ahead of UTC. The 36 seconds results from the initial difference of 10 seconds at the start of 1972, plus 26 leap seconds in UTC since 1972.

Time coordinates on the TAI scales are conventionally specified using traditional means of specifying days, carried over from non-uniform time standards based on the rotation of the Earth. Specifically, both Julian Dates and the Gregorian calendar are used. TAI in this form was synchronised with Universal Time at the beginning of 1958, and the two have drifted apart ever since, due to the changing motion of the Earth.

Easy, we just kill off the idea of leap seconds. Someone please convince Russia and UK to agree so we can do it.

> Easy, we just kill off the idea of leap seconds.

OK, now everyone who cares how much time has passed in terms of the Earth's rotation needs to keep a time separate from everyone else. Astronomers come to mind, for example.

If anyone needs to know about the Earth's rotation that precisely they probably

- don't rely on their laptop's system clock, - don't use UTC either because of minor rotational noise/drift and the discontinuity around the leap-second, and - find the whole 24-hour clock thing a little useless, not being seasonally adjusted etc.

I bet they'd prefer TAI to UTC (or even "Google time"), because they've probably got their own timekeeping systems that will probably interact more smoothly (heh) with it.

More to the point, there are approximately zero astronomers on Earth, and approximately seven billion non-astronomers. Even if astronomers do prefer UTC, it's better to make them have their own systems to add or subtract twenty-something seconds from TAI than forcing all of my timekeeping devices to have a database of historical leap-seconds and an internet connection to hear about new ones.

Not to mention the fact that I can't write down what the time will be in UTC in 86400 * 1000 seconds... Absolutely ridiculous.

In terms of Earth's rotation, you start out hopelessly off because of time zones, so worrying about a few seconds either way is not a problem.

No thank you. Just store times on computers as $epoch+$offset and then convert to UTC when displayed to the user.

That way, only the display routine has to care about leap seconds and it won't break anything.

If you mean "UNIX time", you'll be disappointed -- UNIX time is not the number of seconds since the epoch, it is also leap-second adjusted. It stops when a leap second happens to make sure there are exactly 86400 UNIX seconds between every midnight (so you can do divmod calculations on timestamps.)

Maybe you were suggesting an improvement to UNIX time, though, in which case I think the grandparent and I would get behind your proposal.

Yeah I know about unix time and that isn't what I meant. It never made sense to me. Just have a monotonic timer and an epoch, and worry about UTC and leap seconds in the view.

I'd be happy just to get rid of DST.

DST is great and the reasons I think that are nicely summed up here: http://www.leancrew.com/all-this/2013/03/why-i-like-dst/

> If we stayed on Standard Time throughout the year, sunrise here in the Chicago area would be between 4:15 and 4:30 am from the middle of May through the middle of July.

> If, by the way, you think the solution is to stay on DST throughout the year, I can only tell you that we tried that back in the 70s and it didn’t turn out well. Sunrise here in Chicago was after 8:00 am, which put school children out on the street at bus stops before dawn in the dead of winter.

Was ist GoogleTime?

I don't have a source at hand but, IIRC, instead of having leap seconds, Google slightly dilates each second to compensate for that. The time eventually corrects itself.

Personally, I wish everyone used the 24:00 clock. Maybe the military has messed me up, but I really prefer seeing something like 18:22 over 6:22pm. It just seems simplier.

The real beauty of the 24:00 clock is that it never actually shows that; at most 23:59. Then it goes to 0:00.

The 12:00 clock would be more respectable if it went from 0:00 (midnight/a.m.) to 11:59 a.m. then to 0:00 (noon/p.m.) and to 11:59 p.m. and never showed 12:00. (Let alone continuously flash such a thing as a demand that the time be set.)

That's what 12:00 clocks display in Japan, fwiw, and it's much less confusing (because, really, jumping from 11:59am to 12:00pm is nonsensical for people not familiar with that quirk)

Yes couldn't agree more with that.

Also dates in numeric order I.e. yyyy/mm/dd you know like all the other numbers we deal with not dd/mm/yyyy or the crazy mm/dd/yy.

Use minus instead of slash and you have ISO 8601: yyyy-mm-dd

I had - I think - independently come up with this format for naming log files, and I was so very, very happy when I found out it is an ISO standard.

A little disappointed, too, because every single time I have a great idea like this, I find out that somebody else had it before me. But still.


8601 also gives you proper week numbering (which Europeans tend to like) + weeks start on Monday, after the weekend.

In some countries the weekend isn't Saturday and Sunday though. So the start of the week is kind of arbitrary.

Saudi Arabia used to have it's weekend on Thursday & Friday. Recently they've switched to Friday & Saturday.

Well it's all the weekend. Saturday is the last end of the week and Sunday is the front end of the week.

In the US perhaps, not in Europe

In the UK too traditionally, although that’s changing (by convention).

Yet, in America, people say things like "do you have plans for the weekend?" Or, "What did you do last weekend?"

Nobody ever says, "Do you have plans for the upcoming two days which, respectively, constitute the end of this week and the start of the next one?"

So, Americans and Brits are inconsistent. They have "the weekend" which is a block of two days when salaried people with regular working hours don't work; and they have Sunday as not the week end, but rather the beginning; or the "front end" of the next week. Which means that the two days cannot be the weekend; they are two different ends of two different weeks.

> that's changing (by convention)

It's changing because people have to confront the above reasoning and realize that a week beginning in the middle of something that they have been calling "the weekend" for decades is silly.

Neither does anybody say "What are you doing for the holidays constituted of Christmas Day and selected other days around it?"

Christmas has 3 days.

I believe in my entire life I've encountered exactly one situation in which the day that is defined as the first day of the week actually made a difference to anything: my kids' swim school schedule, where Week N of the term runs from Sunday through to the following Saturday inclusive.

Since the working week (and the school week) here starts on Monday regardless of whether Sunday or Monday is regarded as the first day of the week, it seems to have always been a distinction without a difference to me.

Yes. Putting the year first and with 4 digits is the only safe way because, afaik, nobody anywhere uses yyyy/dd/mm. It's nice that it's also in order of significance but the main advantage is unambiguousness.

You'd think people would have learnt from Y2K but somehow we still see 2-digit years which make dates like 03/04/05 impossible to even guess at. It's slightly better since 2012 where a 2-digit year can't also be a month, but we'll have to wait till 2032 for 2-digit years to unambiguously mean year.

This is an area where I believe localization makes things worse, not better. If every website showed dates with the year first, people would easily understand, regardless of whatever silly local convention they have. As it is, whenever I see a ##/##/## date, I have to think about what the website might be trying to do (do they know what country I'm from? What country I'm in now? Are they using their own local convention?) and what possible dates it might mean. "I think that happened around August, so 08/10/14 is probably not the 8th of October." Localized dates just make no sense at all on the internet.

I cannot tell you the number of times in my life that a 12 hour clock has lead to a _major_ fuckup. I run all my personal clocks on 24hr to prevent said fuckups, but somehow I still feel weird telling people a meeting is at 15:00 in an email.

>but somehow I still feel weird telling people a meeting is at 15:00 in an email.

It feels weird because you think people will think it's weird. You're probably right.

Personally I use a 24-hour representation and convert it to 12 when communicating with others.

11:59AM -> 12:00PM, 11:59PM -> 12:00AM caused me mental discomfort for a long time. A 24-hour clock made a lot more sense, but it doesn't seem like most other people have a problem with 12.

They'd rather you used "normal" time. If you want to fight that battle go for it. I believe you're right.

Next: Date format!

I have not served in the military (we still had the draft in Germany when I finished school, but I was a conscientious objector), but I greatly prefer the 24-hour time format as well.

Maybe digital clocks/watches have influenced this preference, but I still think it is a good idea, because it is unambiguous.

Plus people format 12 hour times inconsistently:

  6:00 PM
  6:00 P.M.
  or just 6:00

There is something to be said for the french style 18h00. So when you see the "h" you know its 24-hour time. With the colon you aren't sure if its 12 or 24 hour time.

And ISO 8601. Sweet, sweet ISO 8601...

Or the even stricter RFC3339 date format, which is ISO8601 without the dumb bits: http://www.ietf.org/rfc/rfc3339.txt


I wonder if anyone collects a list of famous standards that have been superseded by compatible, better standards. Another example would be that many people know of ISO 639 language codes, but BCP 47 is a clearer, more relevant standard.

> I wonder if anyone collects a list of famous standards

If you really want to know, ask Wikipedia. There are some lists there that make me think somebody has a major case of OCD. There is even a List of Lists on Wikipedia, so ... wait, there is even a List of lists of lists...

I'd strongly consider voting for a US presidential candidate solely on if moving to metric was a big part of their platform. So much time is wasted in school on the confusing mess that is the imperial system.

The amount of time and money to shift at this point would be astronomical wouldn't it? It would take a few generations as we would continue to have to teach imperial so people could convert when they ran across any existing media referring to the imperial system measurements.

Hundred years of machines that use imperial, speedometers, odometers, books, tools, software, air conditioning, heating, manufacturing, construction, plumbing, including regulatory codes for each of those.

Not saying its impossible and it's definitely my knee jerk reaction, but at this point I cant imagine it being feasible.

It's really not that bad. Canada did it in the 1970s and it's still around, and we didn't revert to the stone age or anything.

SI units are already in wide use in the US, for instance for electricity (Amperes, Watts, Volts). For the most part, international trade has pushed US industries to adopt metric anyway: metric fasteners are used all over, in cars for instance. Also, countries that have switched still use inch-pattern stuff; in Canada we still use NPT threads and imperial size pipes with no ill effects; even in Europe they still use some imperial-threaded stuff.

My favourite is car tyre sizes. 215/60R16 means the maximum width is 215 millimetres, the aspect ratio of the cross section is 60 percent, R is both a separator and a code for radials, and it's made for 16 inch rims.

Even in the Netherlands, a staunchly metric country, bicycle tyre diameters are measured in inches. (Wouldn't know about car tyres.)

We were almost there in the 1970s.

As for "a few generations", yes, certainly. We have only to look to the mishmash of unit use in the UK.

But what's wrong with taking a few generations to get there?

Turning it around, we got rid of a lot of specialized units - hogshead, chain, furlong, peck, etc. (If Peter Piper picked a peck of pickled peppers, how many many more are needed to make a bushel?) So it's clearly possible.

Also, metric countries still use non-metric terms for some cases, eg, "inches in a screen", the market price of a "barrel" of oil, and food energy in "[food] calories". This tells me that the transition can occur piecemeal.

>metric countries still use non-metric terms for some cases, eg, "inches in a screen"

This is an example of American influence. Decade ago it was in centimetres.

For example, in this 2004 Siemens commercial for Russia CX65's screen size is 13 square centimetres:


If Imperial keeps getting used, it will get even harder to get rid of it.

What kind of argument is this? "It would really hurt to amputate my leg, so let's just wait and let the gangrene go further for now."

> What kind of argument is this?

"it's definitely my knee jerk reaction"

I think it might not be as bad as we imagine. All speedometers I've seen show both miles-per-hour and kilometers-per-hour, all thermostats I've seen have an option for switching to Celsius, all scales I've owned can display both pounds and kilograms (my current one even has an option for stones), all measuring cups I've seen in a long time have both systems displayed, etc. Even lumber wouldn't necessarily be a huge issue. After all, the "2 by 4" is actually 1.5" by 3.5", which equates to 38mm by 89mm. Would it be a huge deal to round that to 40mm by 90mm?

And, in any case, most people have a mobile phone now which can quickly do unit conversions. I really think it actually is feasible, if you could get the public behind it (good luck).

You're not thinking about all the designs, tooling, manufacturing facilities, etc to build all that stuff though. That's where the real cost is.

> Would it be a huge deal to round that to 40mm by 90mm?

It actually would be a big deal to round things like that I think. Whole designs would need to be updated to take into account the new dimensions of things.

Any manufacturing component which cannot easily be adjusted to a slightly different measurement is poorly designed, in my opinion. I would expect that much of these tools could be tweaked to produce things in metric measurements. If not, the dimensions can simply be converted and re-labeled. If something manufactures a square piece of metal that is 3" by 3", is it a huge deal to just document it as being 76.2mm by 76.2mm?

> Any manufacturing component which cannot easily be adjusted to a slightly different measurement is poorly designed, in my opinion.

Imagine all of the parts in a car engine. Everything fits together perfectly. Engine mounts line up in the right places. It all has to be very precise.

Now imagine you take all those parts and round them off a bit. Nothing much, just a mm here and there. If you try to put the engine together with these parts, it's not going to work at all.

Also consider that you can't just simply convert the units. Take a 5/16" socket wrench for example. Nobody makes a 7.9375mm socket wrench.

While I think the rounding would work with lumber, I definitely don't think it would work with things like car engines. For such precisely-measured components, converting to a metric measurement with the correct significant figures should be fine.

Repair shops can maintain tools for both systems (they already have to). For converting, I mean the dimensions of a component, not tools and fasteners that need to work with tools. Those would likely take a "metric only going forward" approach.

Rounding from 38 to 40 is a difference greater than 5%. Some applications might support that, but I'm sure there are plenty that won't. For example, wood joints have very small tolerances.

More manufacturing is done in China, and the factories are used to the different international markets because they're often parts suppliers for foreign companies that have their designing done in various countries. So they're well equipped to make things in either inches or mm.

You're overstating it, there are only 2 other countries in the world that use imperial units

> You're not thinking about all the designs, tooling, manufacturing facilities, etc to build all that stuff though. That's where the real cost is.

A lot of them are already metric.

If you changed the sizes of lumber, I'm pretty sure that someone would come out with "new inch" tapes that were labeled with inches and feet but accounted for the change in size (spreading the extra millimeters out across the distance).

The dimensions in use are something that people are very used to working with and they know how to do the necessary mental arithmetic to work with them.

Luckily, the easy mental arithmetic is a main benefit of the metric system. I really doubt people would create "new inch" tapes. They would probably hang on to existing measuring tape for personal projects though.

The imperial choice of using highly composite numbers is at least as convenient for mental arithmetic as decimals.

What's 1/10th of 3 3/8"?

27/80. Asking HN seems like a roundabout way of getting the answer, though.

I-19 in southern Arizona has its distance signs in meters and kilometers as part of a test during the '70s. There have been plans to change it for a number of years but local opposition has stalled it.


If I remember correctly, raw lumber here in Finland is in inches and planed lumber in millimeters :)

We did it in the UK. Its fine.

Apart from some things like miles and gallons which would require a massive synchronised change that is.

I'd also like it if we drove on the RHS here as well so we can get decent import vehicles.

> Apart from some things like miles and gallons which would require a massive synchronised change that is.

It's less the synchronisation and more the expense of replacing all road signs throughout the kingdom, making it a political non-priority.

I'd happily help foot the bill.

So how much is that 1000 mile journey going to cost?

1000/44.94.4561.119= where's my calculator?

Of course it's going to be hard. Changing legacy code always is. But this kind of argument seems to be presented in the US for everything from gay marriages to gun control laws.

If you really want to, you can do it. And it's not as painful as presented especially since many countries have done it (https://en.wikipedia.org/wiki/Metric_system). And the longer you wait the more painful it becomes.

Isn't the United States in practice a worst of both worlds approach where you have both imperial and metric measurements inconsistently mixed together?

Doubt it.

The separation is mostly distinct, at least to me. We mostly use imperial and in science class and such we use mostly metric.

The worst of both world I've seen was England.

They're mostly metric until you drive their car and then they decide to use mile per hour. That's pretty random...


While the article doesn't flat out say it but if you stare at the uk speed sign image and the comment below the image you will see they are in fact in MPH.


found an article about it:


Yes. The mix in the US is a disaster: http://www.cnn.com/TECH/space/9909/30/mars.metric.02/

The total cost to switch has gotta be in the hundreds of billions of dollars. Has anybody done such a calculation?

My estimate is "approximately zero". For example, people talk about the cost of converting road signs as though we have pick a Tuesday in March and switch the entire country that afternoon. How about we adopt a policy like "when we install new signs or replace old ones as routine maintenance, we make them metric - and here's a conversion chart so we're all using the same mile-to-km rounding."

The Australian experience (as told in the "Metrication Matters" video) was that a "direct metrication" where everybody switches at the same time and doesn't try to maintain two systems at the same time is fast (about a year) and cheap (often saving money).

Road signs can be prepared in advance and covered. The covers can be removed quickly. Another option is to put stickers on the road signs on Metric Sunday.

> My estimate is "approximately zero"

You do realize there is more to switching to metric than just updating the road signs, right?

Think of all the military and space technology that uses inches.

What's the total cost of not switching? I'm willing to bet it's in the tens of billions each year.

All your major trade partners use the metric system: EU, China, India, Australia.

Thing is, we use both systems now. What huge cost is involved by saying "Starting in 2020, the metric system will be the official unit of measurement in the US." Manufacturing companies can switch as soon as they want to. Most consumers' toolsets already have both systems (wrenches, sockets, etc).

"Metrication Matters" (Google TechTalk from 2007)


It starts a bit slow but stick with it. I'm currently rewatching it after several years so I'm not sure there is a calculation. I think there is. A lot of the production in the US already is metric so it shouldn't be that expensive.

Remember that everybody else also had to change their (many) measuring systems over to metric. It went remarkably well.

There's really no benefit in switching for "everyday" things like temperature, speed limits, weight... it would be a huge expense that doesn't really have any payoff.

Where metric units are important, e.g. science and engineering, they are already used.

1. it does have the benefit that it's cross-cultural, you can actually talk to people outside your cultural bubble

2. it's got the advantage that you're using the same measurements as all your trade partners so people don't need two production lines anymore

3. it's got the advantage that people going into science and engineering don't need to build a whole new set of unit references because they've got the one which already works

4. the "huge expense" is pretty much made out of whole cloth for the purpose of saying you can't switch, the UK's metrication cost basically nothing except for road sign replacements which is why those are still imperial

This should be part of TTIP and TPP, actually.

Use SI and ISO standards for everything.

ISO A4 paper, ISO time (2015-08-14 23:52 UTC+2), Metric, etc

> 1. it does have the benefit that it's cross-cultural, you can actually talk to people outside your cultural bubble

We already do that.

> 2. it's got the advantage that you're using the same measurements as all your trade partners so people don't need two production lines anymore

Two production lines? For what?

> 3. it's got the advantage that people going into science and engineering don't need to build a whole new set of unit references because they've got the one which already works

Yup, failure to use the metric system in everyday life is why the US has the worst scientists and produces the least scientific output. Oh, wait.

> 4. the "huge expense" is pretty much made out of whole cloth for the purpose of saying you can't switch, the UK's metrication cost basically nothing except for road sign replacements which is why those are still imperial

I agree with this one. It probably wouldn't be terribly expensive to implement, though I would question the priorities of anyone who is really hung up about it (like the grandparent post who started this whole discussion).

> Yup, failure to use the metric system in everyday life is why the US has the worst scientists and produces the least scientific output. Oh, wait.

Failure to use the metric system everywhere cost NASA a $125 million Mars orbiter just 16 years ago, and yet here you are, insisting that this is not a problem, and throwing in a non sequitur to justify the position.

> Failure to use the metric system everywhere cost NASA a $125 million Mars orbiter

No, bureaucratic failure to address the concerns of people who spotted the error well in advance of the launch cost NASA $125M. The investigation report makes that clear, especially when it goes on to make recommendations for avoiding future mishap; nobody recommended that the engineers needed to brush up on their units.

And yet here you are, insisting that the issue was that we didn't switch over to the metric system, and throwing in some unsupported claims to justify the position.

Using metric everywhere would have avoided this particular mistake. Although indeed, the bureaucracy would probably have let some other error slip through (like mistaking cm for mm).

Science and engineering have already made the switch to metric. NASA and JPL aren't using imperial when designing their probes and rockets anymore.

> 1. it does have the benefit that it's cross-cultural, you can actually talk to people outside your cultural bubble We already do that.

Do you really? Whenever I hear an American telling the temperature of the weather, I have no idea what they mean. I have to guess from the context if it's hot or cold, and even what units they're using because they rarely mention the "Fahrenheit" part.

Conversely, how many Americans would recognize that "35 degrees" is blisteringly hot while "15 degrees" means you'll need a jacket and "40 degrees" could kill you if you don't find shelter quickly?

That's what they thought before solar time was replaced by time zones. I think the US going metric would be a greater boost for the US and world economy than the supposed benefits of TPP, TTIP, and other "trade" agreements combined.

The US does not use the Imperial system. "Imperial" refers to the reformed volume measures introduced in Britain in about 1820, which have a slight metric flavour. The US uses the Queen Anne volume measures, avoirdupois weights, and the international inch (except when it uses the survey inch).

I'd prefer to skip metric and redefine the foot as the distance light travels in a nanosecond ~ 11.8 inches. The length of the path travelled by light in vacuum during a time interval of 1/299792458 of a second seems a bit arbitrary to me.

I would also vote for any candidate that would ban anything but powers of 2 in the definition of computer storage.

I don't get the fascination with powers of two in storage. RAM I understand, because you're counting address lines. But storage? What sense does that make?

I have an SD card with 16.0 GB of space available, and I'm recording video at 9.00 Mbit/s. How many hours of footage? Well, (16.0e3 MB) * (8 bit/B) / (9.00 Mbit/s) = 1.42e4 s, or 3.95 hours.

Now do the computation with binary units.

I have an SD card with 14.9 GiB of space available, and I'm recording video at 9.00 Mbit/s. How many hours of footage? Well, (14.9 GiB) * (1e-6 * 1024^3 MB / GiB) * (8 bit/B) / (9.00 Mbit/s) = ...

Why make people do extra math? Shouldn't we choose units that make things easier to calculate, not harder?

If its 200Mbytes in RAM, it should be the the same on a disk. Also, a sector is 512bytes of 4096bytes, so the labeling it in powers of 10 actually makes the math worse.

Why make people do extra math? Shouldn't we choose units that make things easier to calculate, not harder?

> Also, a sector is 512bytes of 4096bytes, so the labeling it in powers of 10 actually makes the math worse.

Sector sizes are different between drives anyway. So using sector as a label is just bad practice.

And... how does it make the math worse? My hard drive partition has a size of 250140434432 bytes (actual value)... how much is that in GB? Easy, 250.1 GB. For some reason, I was able to do that calculation without the aid of a computer or a calculator.

Humans don't count sectors. We like to shift decimal points around. Computers are good at calculation, so we give them the task of multiplying by 512 for us, and displaying measurements in units suitable for human society.

> Sector sizes are different between drives anyway.

Yeah, 512 bytes or, now, 4096 bytes - neither is a power of 10

The math is worse because the computer (address space) and hard drive are actually base 2 and vendors are selling in base 10.

This worship of base 10 in every aspect of our lives, even when it doesn't make sense and cheats us out of money, is sad. Computers are base 2, memory is base 2, and storage should be base 2 or else its just a batch of lies. I guess I should be glad the humanity didn't have 11 fingers.

The computer is showing you a number, calculating using base 2 will not be any stress on a human since it doesn't show every byte anyway.

A kibibyte is a fictional unit invented to make computer scientists sound like their lips were still asleep after a dentist visit.

All in the service of 10 worshipers. There is a tribe in the US (Yuki) who got it right and counts in octal because they count the spaces in between the fingers (you can hold 8 bottles).

I have tried to convince people to change the metric system to binary, i.e. have a kilogram be 1024 grams, and a kilometer be 1024 meters. Which would finally force hard drive manufacturers to be honest about their devices' capacities.

So far, I have not been very successful, though. :(

I still wonder if you could convince a Congressperson to slip a line in a bill to make the base 2 definition the US Standard.

It's not arbitrary at all, it's for backwards compatibility. Previously the meter was defined in terms of the emission lines of krypton-86, which was a distance that we could measure most precisely with an interferometer. Before that it was based on the size of the earth, which could be measured with a astronomical calculations. And originally it was based on the length of a pendulum that produced a period of a 1 seconds, from which the length can be calculated using a scale, a reference mass and a clock (there was another definition, but this one was more precise I believe).

Each definition gives you something pretty close to what we now define as a meter, but the precision to which they could be measured at the time differed. The length of a "meter" was more or less the same distance in everybody's mind then as now, but if a 10000 scientists would sit down and perform experiments to actually calculate the exact length in 1700 versus, the mean of the values they reported would be about the same as it is today, but the statistical uncertainty would be much much much higher.

The redefining of the meter has typically occurred when some new process was invented that was more precise than the the previous method, for instance today measuring a laser in a vacuum is something like 1/3 the uncertainty than the old method using an interferometer. So the uncertainty is smaller, but crucially the mean value is still basically the same or very close. That's the reason for the weird 1/299792458 seconds, it's because we can define seconds very very precisely (from atomic clocks) and that's the amount of time it takes light in a vacuum to travel the same distance as the previous most precise known value for the length of a meter.

If it wasn't done this way, every time we invented a more precise method to measure distance and want to improve the precision of the meter it'd be like defining a whole new unit. Defining a foot as the distance light travels in a nanosecond is fine now, but if we discover an even more precise way to measure distance than a laser in a vacuum we'd end up in the same position, where a "foot" would be some strange fraction of a reference value that makes it work out to agree with the old most precise known value.

This sort of stupidly accurate measurement doesn't matter anymore on the day-to-day life scale as the length of a meter is known to within ~(10^-9)%, which is about a ten picometers. However, that means then that if you're fabricating silicon at the nanometer scale, the actual exact length of a nanometer is only known to within about 1% of a nanometer (if I did my math right, might be off by a factor of 10). That's much more significant.

"It's not arbitrary at all, it's for backwards compatibility."

The original definition of the meter was one ten-millionth of the distance from the equator to the North Pole. Sure, we got more precise, but its still arbitrary.

The only interesting thing about metric is the relation of length, volume, mass. But, a liter is not a cubic meter, nope - its a cubic decimetre, another arbitrary decision.

The units that have the best claim to being non-arbitrary tend to be of rather inconvenient orders of magnitude: https://en.wikipedia.org/wiki/Natural_units#Systems_of_natur...

A certain amount of arbitrariness is inevitable at human scale.

Indeed, what units are convenient depends on context. Metric/SI is set up so that most of the un-prefixed units are a convenient size on human scale. But doing this means that some derived units will have values that are not human scale (like the Pascal, atmospheric pressure is ~100000 pascal). The nice thing about SI is that they subdivide easily in powers of ten, so even if the Pascal is inconvenient in our day-to-day lives it's easy to talk in kPa as 1000 Pa.

This is contrasted with something like Imperial units, where every division of e.g. distance is supposed to be roughly based on some physical object. That's why you end up with 12 in/ft, 5280 ft/mi etc. Or alternatively you end up with metric-imperial hybrid units like the kilopound.

Not to mention, what units are convenient vary depending on what you do. For instance one of the SI alternative units for energy is the electron volt (eV), the work done to move an electron through a 1 volt potential. This is a tiny amount of energy on human scale- a common analogy is that 1 MeV (10^6 eV) is enough energy to make a single grain of sand twitch a little bit. But, if you're a nuclear physicist (or maybe a chemist) then eV are typically much more convenient than say Joules.

I'd argue that the unit liter was chosen as a nice power of 10 relation to a cubic meter (the obvious initial choice) as being the best for 'every day' common person transactions.

Take a bottle of water, most commonly the sold sizes are 0.5 (about 16oz) and 1 liter bottles; a liter is also pretty close to a quart.

As another example, it is common to find soda pop sold in bottles of 1, 2, and 3 liters (depending on the brand).

Yes, the original definition is arbitrary of course. I was talking about the random constant fraction of a second. Units are always arbitrary, by definition they're just a convention. They're a way of relating an abstract concept (a number) to some intuitively meaningful thing.

Wait, wouldn't you still know the length of a nanometer to with (10^-9)% of a nanometer?

No, because defined this way, the absolute precision to which distance can be defined is ~10^-11 meters. So 1 meter is known within +/- (10^-11) meters. If you convert (10^-11) meters to nanometers that's 10^-2 nanometers, which is 1% of a nanometer.

You don't divide the uncertainty when you scale it because the error is in the actual definition of distance itself. The uncertainty in for example 1 meter is the same as the uncertainty in 1 nanometer (=10^-11 meters), which is a correspondingly much larger fraction. It's kind of weird and perhaps counter-intuitive, but that's how it works.

Phrased differently, the idea of a meter is exact and it's our ability to measure distance that is uncertain. The error isn't in saying "a meter is some specific fraction of the distance light travels in a second", it's in determining what physical distance in the world is represented by our definition of the meter.

Please do also remember that the diameter of a hydrogen atom is 10^-10 m.

So, the distances were talking about are actually at the level where quantum mechanics and the Heisenberg Uncertainty Principle matters.

Ah yes, this gets at an interesting side note, which is the difference between epistemic and aleatory uncertainty. QM uncertainty is aleatory, while issues of measurement precision are epistemic. The distinction being that epistemic errors can be reduced with better tools and with an ideal measurement device are eliminated, whereas aleatory uncertainties are always present even with a perfect tool.

Assuming QM's predictions are valid, we will never ever be able to to improve our measurement beyond the point where errors from the uncertainty principle dominate. Our current definition of the meter is pretty close to this limit, so personally I don't expect the meter is going to be redefined any time soon because we're already near the sort of scales where the idea of "distance" starts to get kind of fuzzy.

You still have plenty of arbitrary-ness in your definition of a (nano-)second. :)

I'll take one level over two levels. Besides, memorizing that metric fraction is just wrong. Plus, do we really want to go into space with a unit of measure whose origins are tied to the size of the Earth? Let's at least free part of the definition from an Earth-centric bias. I fear a Mars-centric meter might develop.

Unlikely. The metric units have a long history of never having been redefined in a way that breaks previous usage. As far as I'm aware, all redefinitions have just improved the precision, so they don't hurt previous users. Your proposal would make all the old feet wrong and anybody seeing "foot" would have to think about which type of foot it might me.

Oh, I don't know, I bet the future Mars independence movement will try to throw off all the shackles of Earth and switch their units of measure. I jest just a bit, but its not like we humans don't pull stunts like this all the time in the spirit of independence.

My proposal is basically the engineering estimate of foot. I would probably name it something different if I were Emperor / Very Powerful Politician. Its small enough to derive the mass and volume units directly.

My dream also includes the Hanke-Henry calender.


My pick of the numerous proposals. Mostly because I can understand it.


Interesting! This one keeps the 7-day week, which means it has a chance of succeeding. But wouldn't inserting an entire extra week every few years cause problems with things like monthly salaries and mortgage payments?

The financial year assumes 30 days per month, 12 months per year, 360 days per year. That’s what’s the basis for your salary calculation.

Indeed I'm paid the same amount every month. Yet I'm paid on the same day of every 28/29/30/31-day month, i.e., with slightly varying intervals. The differences between the months are currently small enough that it doesn't really matter much, and it always evens out the next month. But if suddenly a wild week appears, I will want to get paid for that week. Either that, or everyone will need to keep to a parallel traditional month system for periodic payments.

Why would you want to get paid for that week?

You don’t get paid for January 31st, March 31st, May 31st, July 31st, August 31st, October 31st or December 31st either. (Actually, you get paid for 2 of them. Still leaves 5 days a year without pay, 6 days in leap years).

So, why should you get more? Just because you feel like you deserve it?

As far as I know, I am paid for those 31st days, but the pay is evened out over the year. You may live in a different country than me. It is actually pretty presumptuous of you to claim that you know on what basis my salary is calculated, unless you have comprehensive information about the entire Western world.

I have expenses during that week. I happen to earn enough to be able to save a bit during the rest of the year, so it won't be a problem for me, but not everyone has that luxury. There is no reason to make life harder for hard-working poor people. Also, entrepreneurs/employers are earning money that week, so it won't be a problem for them to pay.

If you are paid monthly, does your wage differ from January to February?

If no, then you do not get paid for these days directly, and you won’t get paid for the leap week.

If yes, well, then you’d also get paid for that leap week.

You don't give up, don't you? As far as I know, my salary is based on slightly less than 30.5 days per month. Anyway, that was not my point and I don't care about that.

You are not addressing my real argument, which maybe originally I did not articulate in a way that you understood. Which is that in the proposed calendar, unless special arrangements are made, at some point people will have to wait 37 days for their salary and will only get ~30 days' salary at that moment. That is the thing that's currently not much of a problem. Whether that gets resolved by reducing the salary over the normal months by a small percentage and introducing an extra payment for that week, or by spreading out the payments over a 30/31-day schedule that ignores the new official months (which I think would be confusing and complicated), or by some other method, I'm fine with that. I did not say I wanted extra money, but it needs to be spread out evenly over the year.

It would be nice if graphics cards / VM bioses extended text video modes (perhaps new VESA modes) to supported unicode, preferrably utf32 to make buffer offsets easy to compute. Sure it means more fonts and not every codepoint can squeeze into a few pixels, but most additional codepoints will be good enough and better than seeing garbage on the screen.

IPA would be nice, too.

Metric's nice, but each day being 10 hours in a 10 day week just doesn't work for me. But at least it's Fiveday, and I have a 20 hour weekened to look forward to.

You should use the International Fixed Calendar https://en.wikipedia.org/wiki/International_Fixed_Calendar

The failure to adopt the IFC is the proof that human beings will forever be shackled to their ancestral beliefs.

Re parent: ~365.25 does not relate at all well to bases of 10, or much else.

We can actually thank existing social patterns including religions for locking us in to a '7 day week' which is what many calendar systems try to promote.

With (roughly) every 4 years also containing an extra day of correction (the solar orbital period is not quite 365.25 days) the concept of 'leap days' is necessary anyway.

    360: 2 2 2 3 3 5 << Oooh shiny
    361: 19 19 << Annoying
    362: 2 181 << Annoying
    363: 3 11 11 << Less Annoying, but weeks are too long
    364: 2 2 7 13 << Might be workable, there's a 7 in here.
    365: 5 73
The IFC picks the 364 route, and puts the leap day in the middle of the printed year. The decision of where to put that date is pretty arbitrary, but probably based around a 'summer holiday'.

360 is 5 away from the closest integer orbital period of earth. Therefore 5 'extra days' (likely holidays) would need to be added. 5 is also a factor. It would either make sense to have an extra 5 day period as one long holiday set, or an extra holiday spread out through the year 5 times, though arguments for otherwise could be made.

7 and 360 get along poorly though. We'd probably also want to keep '12 months' for sanity/existing contractual structures (esp since we can't be in units of 10), and having 'months' close to current months seems to be an advantage.

Taking 2 * 2 * 3 out of the factor set, we're left with 2 * 3 * 5 for each month.

Stepping aside for a moment, let's examine a 'perfect' 28 day month in the IFC/current calendars. 2/7ths (0.285714) of the time is 'weekend' time.

I'd propose a 10 day 'week' in the new time, I also think 2 days in a row off of work is advantageous, and I think that this time should be time 'normal workers' can expect to share off. I think that this number might actually grow over time as we increasingly approach a more Utopian society based around automation and abundance.

The 10 day 'week' would begin with the following structure.

    * 2 days of work
    * 4 day period of 3/4th work (on each of these days about 1/4th of workers would have an 'erands' day)
    * 2 days of work
    * 2 days off work - weekend
On this schedule half the workers would get a 2+5 or 5+2 and the other half would get 3+4 or 4+3; all workers on a 'standard' shift would also see 0.3 weekend time, a slight increase.

The extra 5 holiday days could either be divided somehow over the year, or used up all at once as a burst half-week holiday. The necessary leap year correction would be added to one of those periods as a holiday as well.

Since this is another example of a 'standards proposal', someone must have thought of this before...


This link seems to do a decent job of discussing some of the other aspects of converting to a 10 day calendar (many of which also exist for the IFC) http://www.scientificamerican.com/article/is-it-time-to-over...

And \n line delimiters and \s+ tabs!

Heh, I initially read it as "improves", and was wondering why they'd bother. Removing it is surprising, but makes sense.

How does locale work on the keyboard side, then? What determines whether text entry is right to left or left to right?

Exactly the same as before -- your programs just expect UTF8 codepoints as input.

Keyboard layout is independent of the locale used, and so are the directionality of the text which is a property of the characters themselves.

Input methods don't need to align with encoding.

Presumably if you press the RTL key on your keyboard, or if you enter the sequence of codes that converts it from LTR to RTL.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact