I like squiggle, if you're a text rendering engine you might want to use "glyph" although you might already need that word for something else. But try to avoid character because that word already has far too many meanings, most of which won't be what you wanted.
A Go rune is more like one Unicode Scalar Value.
However, be careful, terminology is subtle: what is a Code Point, what a Scalar Value, etc.
clugr evokes more Freddy to me
PS Be prepared for a long read.
Go uses "Runes" which is a pretty unambiguous and memorable term.
Though, in Go's case they don't include use of ZWJ, as they're limited to 32-bits.
I would love to see unicode characters to allow for arbitrary combinations beyond those defined using just ZWJ to allow more flexibility (e.g. blizzard could be created by adding like "slowflake x 5" which creates a single character with five snowflakes, without having to create an entirely new character representing blizzard from snowflake + ZWJ + snowflake).
As an aside, my favorite ZWJ magic is black flag + ZWJ + skull and crossbones = pirate flag.
Also see https://en.wikipedia.org/wiki/Blissymbols for more symbol language fun.
Many Chinese characters are constructed with two parts of smaller characters, one of which can indicate a general concept while the other provides a vague pronunciation hint. This isn't really good enough to provide you with enough information to guess at either the meaning or the pronunciation if you don't already know the word, but if you do already know the word from spoken language, then the hints might be enough to recognize the character when you read it too.
木 = tree
森林 = forest
田 = field
力 = strength
男 = man (because he labours in the fields)
人 = person "rén"
门 = gate "mén"
们 = plurality marker "mėn"
也 = also "yé"
他 = he "tā"
他们 = them "tāmėn"
One can come up with a story for why person + gate is plural, but the inclusion of gate in the character for plurality is really more about the sound of gate.
Wiktionary is neat because it also shows you pronunciations in several different Chinese languages (Mandarin, Cantonese, Minnan, Hakka etc) and the relationship to Japanese, Korean and Vietnamese, if it's there.
There's a symbol for the torso, for the fist, for touch and for the strongness of touch. If you hit your own shoulder with a fist with the knuckles facing upwards, you rotate the symbol for the fist, paint the torso and the fist next to each other and put the touch symbol approximately where the shoulder would be.
Even with less granularity, that's still like 256 bytes per grapheme.
I think this would be great for machine translation though.
Imagine poor kids 100 years from now having to memorize all those emojis and their combinations at school. Or having to know 3000 basic emojis in order to be able to read news.
I mean, we all know what a "polar bear" is, right? If those kids know how to say "penguin", and they know how to write the parts, then they know how to write "penguin" in Chinese.
Meanwhile, the parts in "penguin" aren't used in any other word. The "pen" isn't a writing utensil. And what's a "guin"? If you didn't learn how to spell beforehand and were suddenly asked to write "penguin", maybe you'd write "pengwin" or "pengwen", or mishear it as "pengwing" (because birds have wings so it makes sense, right?) English writing is brute-force memorization of spelling with some patterns that often don't hold, just like Chinese is brute force memorization with some patterns regarding sound or meaning that often don't hold.
And we can see that all English speakers still struggle to spell some words that they don't encounter often, and some even struggle to spell words they do use often.
An advantage of non-phonetic spelling is that it doesn't privilege any one accent over any other, so allowing a polycentric acrolect with each variety picking up accent and vocabulary from the local dialect, but maintaining mutual comprehension in writing.
Knowing a bunch of languages in order to have a proper context for English spelling is surely a higher threshold than memorizing the characters in your own language.
tučňák = noun, penguin
tučný = adjective, fat
I don't know if that helps determine the origin of the English word, but it's definitely funny to think that there’s a nation which calls penguins “fatties” :D
dang, it would be cool if mods could mark a post as "allow emoji here".
My apologies if HN's formatter strips the special characters (edit: it didn't yay!): I wrote integral sign, subscript 0, superscript 1, math italic x, thin space, math italic d, math italic x. And maybe the idea is we should use words like "the integral from zero to one of x with respect to x." IDK.
That said unicode is not free from typesetting weirdness, see the character ﷽.
2) Layout of mathematical formulas is reasonably complicated. It doesn't make sense to force that complexity to be included in every text layout engine.
It looks like this:
∫_(-∞)^(+∞)〖exp(-a/2 x^2) ⅆx〗=√(2π/a)
I find it quite readable, even for quite complicated formulae like the above. You can also replace the unicode symbols with Latex-style escape strings, like \sum or \below.
I imagine in a world without them existing a lot of the non-ascii paths would not be regularly used.
I think what you are looking for is stamps and png/gifs, which are also supported in most relevant chat platforms these days.
Firefox: angry face
Yes, I realize they are widely used, but they are widely used despite this stupidity, not because of it.
Here is just one random article from Emojipedia about the history of the "folded hands" emoji: https://blog.emojipedia.org/emojiology-folded-hands/ There are many more examples.
In addition to that, almost all emoji keyboards now autocomplete the emoji based on standard names, so if you search for "disappointed" on most any emoji keyboard, you will get the same face.
For reference, here is the current official emoji set, including the standard names and images showing how they render on different platforms: https://unicode.org/emoji/charts/full-emoji-list.html
Even if there was no vendor consensus, I'm not sure what the misunderstanding could be with this particular emoji. There is only one pistol emoji, and regardless of whether it is rendered as a water pistol or a revolver, it still is used to represent the concept of a pistol. There are better examples of emojis that used to be displayed with a facial expression or hand gesture that had a relatively different meaning depending on the platform. For example face with rolling eyes, person tipping hand (information desk person) and so on.
Well... yes? That's because avalys is obviously correct. The "solution" defined by the Unicode consortium is so spectacularly stupid that everyone has unanimously agreed to move away from it by synchronizing their images, because it makes no sense to send an image unless you know what it will look like.
To your point, this is an aspect where integrating and regularily adding emojis to unicode pushed a very technical system under popular attention, and attracted a lot of people from unrelated backgrounds who now have to understand how it works and what its goals in the first place.
It was even more dumb when they did it for CJK. Unicode is now neither fish nor fowl; you can't rely on things to look the same in different places, but you can't rely on them to mean the same either; there's no proper separation of concerns because they decided it's fine for the meaning of a character to change based on the font you're using.
I kind of get why we got here, as encodings were a quagmire for a long time, and coming to a real clean, everybody's happy solution with unicode looked completely unrealistic.
We now have way better compatibility, got to settle for one encoding in most cases, and the annoyances are for now somewhat manageable (if/when China opens more to the global net, it might be a different story). Installing fonts is still easier than adding encoding compatibilities.
This would promote an open culture of creating inline symbols. E.g. where is our "covid" symbol, and why do we have to wait for the Unicode consortium to define it for us, and app makers to implement it??
Just like we all go "bla bla" on the phone because we can choose our own sounds?
Although we've never actually kept score, through any of that.
This makes me wonder if anyone has created a version of base64 that uses the vast, sprawling space of unicode to take advantage of these glyph-count-based restrictions.
If they have, I hope they called it uuuniencode.
It can store 385 bytes per tweet. This link includes a bit more technical explanation of how Twitter counts characters towards the limit. Apparently, using the entire range of unicode characters does not improve compression because of the double weighting of emojis and other characters as described in TFA. It links to a base131072 encoding which can only store 297 bytes per tweet.
I'd suggest that the same idea applies to nuanced facial expressions. You could certainly devise a set of glyphs to stand in, but they would have to be learned by everyone.
The failures we see today are attempts at rendering expressions on drawn faces -- and to be fair, even a high res photographic still image of a real human making an expression could be easy to misinterpret. Especially across cultures and subcultures. My 15-year old niece has very strong opinions on how many periods I use to end sentences..
I think "fixing" emoji is probably a lost cause. But I am biased since I don't care at all. The most interesting and amusing thing I know about emoji is that when Apple changed the "gun" from a realistic-looking Glock/etc to a toy squirt gun, all the ingrates who used the image in a threatening manner ended up looking silly to iOS users. Android followed quickly.
I support the same sillification of negative and angry facial expression emoji as well, FWIW, and I think it's probably hopeless to try to cram nuance into any of them. Fortunately, we still have words.
Apparently you cannot use emojis on HN.
et cetera et alii ad nauseam
Even as complex as emoji are becoming, they really still don't address the issues behind online miscommunication. And I really doubt they ever could.
on the other hand people still for some reason believe that "lol" only means "laughing out loud" which's lol itself.
so maybe it's impossible?
1. specification authors want to make sure the extended grapheme cluster algorithms are widely adopted so that implementations can correctly deal with devanagari
2. they notice no one gives a shit about brown people and their writing systems
3. combining emojis requiring the use of the same underlying algorithms were popularised in order to push the adoption
Anyone know if I'm missing anything, or is there no support for 13.1 yet? My standard routine is to just install every noto font I can find (noto-fonts noto-fonts-cjk noto-fonts-emoji noto-fonts-extra).
King - Man + Woman = Queen
Brother - Man + Woman = Sister
And some sexist variants of these too. But the article is on a different topic.