Hacker Newsnew | past | comments | ask | show | jobs | submit | mids_hn's commentslogin

Because it's a real word that is applied in this case.


Assembly is a skill learned through practice, much like any other language.


Under the OSIs definition of open-source, and under the definition of free software, anyone can buy your software and redistribute it to others for free. This effectively limits you to earning money only off the labor of your work, not the work itself, and makes your business model effectively useless. This is such a huge difference yet the FSF, as always, confuse people (maybe even intentionally) by saying that free software can be commercial. No, it cannot. The services you provide can be, the labor you provide can be, but the software - no.

You can remove that freedom to the user. After all, the source is available, it can still be tested for malware, but your work won't be open-source.


How does open source get money? According to the original GNU Manifesto from the 1980s

> All sorts of development can be funded with a Software Tax:

The GNU Manifesto arguably concedes OSS won't make money

> In the long run, making programs free is a step toward the postscarcity world, where nobody will have to work very hard just to make a living. People will be free to devote themselves to activities that are fun, such as programming, after spending the necessary ten hours a week on required tasks such as legislation, family counseling, robot repair and asteroid prospecting. There will be no need to be able to make a living from programming.

https://www.gnu.org/gnu/manifesto.en.html


While we're trying to change the status quo in mathematics around the world, let's also all switch to the World Calendar and start using Base 12.


You could start by calling it base twelve (because 10 would be too confusing.)


Or to really obfuscate called it base 10 (in base 12) :))


60 is a better base than 12. Even 30 would be better.

And we should drop the stupid Earth-centric metric system and go to some kind of Planck units.


Base sixty would be much worse for humans. There are too many digits to memorize, and the multiplication table is enormous. It does give you a factor of 5, but 5 is only so common because we use base ten.

Base twelve is better than thirty because you can represent 1/4 (0.3) and 3/4 (0.9) with a single digit.


We do, for physics. Very often c is treated as 1. The "meter" is really just a conveniently human-scaled unit, equivalent to the "degree" in trigonometry.


> 60 is a better base than 12.

And it proved itself useful on this Earth for probably longer than the decimal system.


My rule of thumb is to partition my data sets, to make sure the conditional has the same result in as long a row as possible.


They can, however, still coerce you to give away your text-based password. I agree, though, that slicing one's finger off is simpler for all parties.


You don’t even need to slice, just manhandle. It’s easy and mechanical, can be done in a few minutes in a dark alley. Whereas a code, any code, requires a degree of cooperation that the individual can choose never to grant.

(Obviously a lot of people will grant it, but a sufficiently motivated person - human-rights activist, political dissenter, journalist, etc etc - might not)


If we’re getting that deep into the hypotheticals, couldn’t said person just not set up biometric logins?

For the rest of us where physical coercion to unlock the phone isn’t in the threat model, it really does improve on the trade offs between security and convenience.

Feels disproportionate to say we should not do the latter because of the former.


Your threat model likely involves simple robbery; making you “look here” is quick and painless, and increases the loot value significantly. And yes, biometrics in general are not good. The best protection remains a pin or passcode.


If they're going to physically threaten you into unlocking your phone, a PIN or passcode won't change that.


Pickpocketers minimize any physical contact. Biometrics protect against them. PINs do not. Anecdotally I know about an iPhone that got unlocked after a theft at a party. It was protected by a PIN. The owner thinks that the pickpocketer learned the PIN by looking over his shoulder.

On the other side burglars can get into a house and force people to unlock their phones or reveal their passwords, if they care to. There is no protection against that unless those people value their secrets more than the harm the burglars will do to them.

All considered, I unlock with my fingerprints.


We're not talking about the same adversaries here. If the police unlock people's phone by pointing it at their face it'll be done with impunity and in a widespread manner. Less so if they start cutting off fingers.


In the U.S. at least, passwords are protected under the 5th amendment but you can be ordered to unlock a phone with a fingerprint or a face since it's something you are and not something you know.


> passwords are protected under the 5th amendment

This is not the case, at least, the law is not very settled in that direction. There has been at least one famous case [0] where an appeals court found that a defendant could be help in contempt of court and imprisoned for refusing to provide his password.

[0] https://www.theregister.com/2017/03/20/appeals_court_contemp...


State actors would have zero issues lifting a fingerprint off a phone, then making a prop for the sensor. Alternatively all they need is a minor tranquilizer and there you go, provided the human asset is available.


Sorry I was unclear, I meant "cutting fingers" as an example of torture meant to extract passwords, not in the sense that they'd use the finger to unlock the phone. Face-ID and fingerprints share the same issues compared to a password.



The wrench can be resisted by determined individuals; whereas fingerprints and face-recog cannot.


The wrench does not represent physical torture, but metaphorical, basically representing a pressure point tailored to each individual.


People resist torture all of the time. The problem with torture is that there's often little reason to think that the torture would end if you give up the password, other than the word of the torturer.


Like I said, the wrench is not physical torture. By pressure point I meant things like blackmail, threats against your family or loved ones, etc.

Everyone has something valuable to lose.


> the ability to synchronize a byte stream picked up mid-run, with less that one character being consumed before synchronization

Can somebody explain or link to an explanation on how UTF-8 allows for this?


Note that it says less than one character. A character in UTF-8 can be composed of multiple bytes.

The encoding scheme is laid out in the linked email. Based on the high bits it's possible to detect when a new character starts. Relevant portion:

  We define 7 byte types:
  T0 0xxxxxxx      7 free bits
  Tx 10xxxxxx      6 free bits
  T1 110xxxxx      5 free bits
  T2 1110xxxx      4 free bits
  T3 11110xxx      3 free bits
  T4 111110xx      2 free bits
  T5 111111xx      2 free bits

  Encoding is as follows.
  >From hex Thru hex      Sequence             Bits
  00000000  0000007f      T0                   7
  00000080  000007FF      T1 Tx                11
  00000800  0000FFFF      T2 Tx Tx             16
  00010000  001FFFFF      T3 Tx Tx Tx          21
  00200000  03FFFFFF      T4 Tx Tx Tx Tx              26
  04000000  FFFFFFFF      T5 Tx Tx Tx Tx Tx    32
[...]

  4. All of the sequences synchronize on any byte that is not a Tx byte.
If you are starting mid-run, skip initial Tx bytes. That will always be less than one character.


Note that UTF-8 has since been restricted to at most 4 bytes (i.e. the longest sequence is `T3 Tx Tx Tx`).


So now we know who is really responsible for the whole MySQL utf8mb4 fiasco -- these 2 guys sitting in a diner, conjuring up a brilliant scheme to cover 4 billions characters, which turned out to exceed the actual requirement by more than 2000x.

September 1992: 2 guys scribbling on a placemat.

January 1998: RFC 2279 defines UTF-8 to be between 1 to 6 bytes.

March 2001: A bunch of CJK characters were added to Unicode Data 3.1.0, pushing the total to 94,140, exceeding the 16-bit limit of 3 bytes UTF-8.

March 2002: MySQL added support for UTF-8, initially setting the limit to 6 bytes (https://github.com/mysql/mysql-server/commit/55e0a9c)

September 2002: MySQL decided to reduce the limit to 3 bytes, probably for storage efficiency reason (https://github.com/mysql/mysql-server/commit/43a506c, https://adamhooper.medium.com/in-mysql-never-use-utf8-use-ut...)

November 2003: RFC 3629 defines UTF-8 to be between 1 to 4 bytes.

Arguably, if the placemat was smaller and the guys stopped at 4 bytes after running out of space, perhaps MySQL would have done the right thing? Ah, who am I kidding. The same commit would likely still happen.

EDIT: Just notice this in the footnotes, and the plot thickens...

> The 4, 5, and 6 byte sequences are only there for political reasons. I would prefer to delete these.

So UTF-8 was indeed intended to be utf8mb3!


This is also a very simple form of using the idea of a "prefix-free code" from information theory and coding. (the codes {0,10,110,1110,11110,...,111111} is a prefix-free set).

I think there's also the idea that the code can "sync up" when it say, starts in the middle of a character.


Other people have answered your question but I wanted to clarify one point. The word "character" here means "unicode code point". However, what the user thinks of as a single character can be made up of more than one code point. This presents a different problem and one UTF-8 itself can't help with.

The Unicode Consortium has a report on extended grapheme clusters[0] (i.e. user-perceived characters). Essentially, if you're processing some text mid stream, it might not be clear if a code point is the start of a new user-perceived character or not. So you may want to skip ahead until an unambiguous symbol boundary is reached.

[0]: https://www.unicode.org/reports/tr29/


It’s fairly simple, actually: leading bytes have a specific bit pattern that continuation bytes don’t. A single-byte character will have the topmost bit unset (0b0xxxxxx), and for a multi-byte run the first byte will have the top two bits set (0b11xxxxxx) and any succeeding bytes will have the top bit set but the next bit unset (0b10xxxxxx). This means given an arbitrary byte you can always tell what it is, and you can tell when you’re at the start of a next character by looking for those first two bit patterns.


The upper bits of the FIRST octet are used to determine the run length of the sequence. All of the other bytes in the sequence use the upper two bits (0xC prefix len 2 OR b10xxxxxx) to indicate that it's another 6 bits of data for the current character.

If synchronization is lost mid-character, by definition that interrupted character is lost. However the very next complete character will be clearly indicated by a byte beginning with either no sign (a 7 bit character) OR a number of 1s indicating the octet count followed by a zero.

This is covered in the section titled:

    Proposed FSS-UTF
    ----------------
    ...
       Bits  Hex Min  Hex Max  Byte Sequence in Binary
    1    7  00000000 0000007f 0vvvvvvv
    2   11  00000080 000007FF 110vvvvv 10vvvvvv
    3   16  00000800 0000FFFF 1110vvvv 10vvvvvv 10vvvvvv
    ... Examples trimmed for mobile.


All trailing bytes and only trailing bytes are of the form 10xxxxxx. If you read such a byte you just have to iterate backwards until you find a non-trailing byte.


It is called Self-synchronizing code, simple and beautiful design.

https://en.wikipedia.org/wiki/Self-synchronizing_code


C was fast.

On a PDP-11.


GNU Taler doesn't have those issues. It keeps payers anonymous, while payees not, allowing them to be taxed. Being a payment system, any currency can be attached to it, including USD and Bitcoin.


Another approach is a source-available license.

The one I'm interested in would be one, where viewing, modifying, recompiling the source is allowed, but redistribution is allowed only to those who had bought the license from all owners of the source and it's modifications. It would produce a nice "waterfall" effect.

It wouldn't be open-source nor free software, but most people don't really need limitless redistribution, and the source would be proof that said software isn't malicious.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: