It still pains me that Unicode decided to get in the business of curating an ever-growing clip art collection.
It’s like the Oxford English Dictionary decided that they’re actually poets; their main job is suddenly to invent brand new words that let people write with an exciting level of density and poetic license; and those new dictionary words would also be multi-color because everybody owns a pack of colored pencils, right.
Unicode needed to include written-on-paper glyphs that existed in the world. That makes sense to most people. But a lot, lot, lot of communication these days happen digitally (digital first). You can't scribble things as freely in a textbox like this one. You can certainly make suggestive combinations like `:)`, but you are pretty limited. (How do you scribble "family of three: mom, father, daughter" without any emojis?)
How do you propose that Unicode include glyph innovations that are digital-first?
Unicode doesn’t include ligature code points for every English word either. Somehow we manage to keep glyphs and words separate. Why couldn’t we do the same for inline graphics?
The copyright situation around emoji is actually worse now than if they’d been kept out of the standard. People expect emojis to look like on iPhone, but those specific graphics are owned by Apple, leaving everybody else scrambling for emojis that look close enough without infringing.
If emojis had originally been released as an open source library of SVG graphics together with some kind of standard shorthand way to refer to them without embedding an entire inline URL, we could have truly open clip art instead of this weird semi-proprietary mess.
In my ideal world, people would have a standard library of SVG emoji on their phones, they can send them around, but they can also save the emojis others use to their own library. Of course, artists can add new SVG emoji and send them around.
This way, we get an evolving library of emoji, just like how languages evolve. Unicode is stiffling this evolution.
If I buy a set of stickers (little animations, much more expressive than emojis) I can send them to anyone in chats, or I can send the set as a gift to that user; in the latter case, the user can add them to their library.
Your library is the standard stickers, any free stickers you've added, anything you've bought, and anything you've been gifted.
The end result is that some people have a fairly distinct sticker-communication style because they have in their library relatively obscure ones, or just like to use the less-popular free ones.
In theory anyone can make their own stickers but it's kind of a pain, and I haven't gotten around to it:
So in my ideal world, we'd not have any emojis that can't be reduced to expressive ASCII, but we'd have a massive public library of freely available SVG Stickers with accessibility features, and all the chat apps would use them.
Well, as long as you can use these stickers in an in-line way. The graphic should be part of the sentence, not just an addition to it, although the latter should be possible too I guess.
You can’t, but there are a large set of built-in emoji-like inline graphics in addition to the normal emojis. Some of them quite good but the selection seems a bit random. They are privileged over keyboard emojis in that they pop up for selection when you type keywords. AFAIK users can’t create their own.
However, stickers have a culture of their own, and I don’t think it’s inferior to emojis, maybe it’s even better. You can express a lot in a sticker or two, without words, and quickly. And they’re not tiny ambiguous glyphs, they have the room they need.
So far in my Line experience I’m the only one using emojis (I should probably stop) because it seems like the standard is to use text to say specific things and use stickers to convey emotions.
I’m a foreigner and don’t have a million Line friends, but that’s how it looks to me.
> Unicode needed to include written-on-paper glyphs that existed in the world.
If it was the case, Emoji is seriously lacking a penis. I mean, seriously, give men a way to draw and this is what you will get. There are already existing "subjective combinations" like 8===D and the eggplant emoji for which it is its most common use. There is a proper penis in the hieroglyphics, but just because there is a hieroglyphic doesn't mean there is no corresponding emoji (ex: eye).
On the other hand, many emoji pass even though I have never seen anything remotely similar being scribbled ever, it is the addition to the emoji block that drove its use. Plus, because inclusivity, all its equivalents in different genders, skin tones, cultures, etc...
Clearly, Unicode acts as an arbiter here. It decides on what it thinks it is "good" rather than what is really in use. And it is not just about penises, the hangman is also absent, as is the "gun pointed on head" sign (suggesting suicide) that is commonly used in real life.
Worth noting that the first emoji were included because they were already in common use before Unicode, in Japanese phones specifically. Unicode just included them so that Japanese users could switch to Unicode without loss of functionality.
The tahitans can deal with communication just fine with only 13 letters, so I guess we could probably trim the letters b, c, d and g out of unicode too.
With enough optimisation we might just be able to communicate with just a boolean-based language.
(i.e. why limit written communication rather than make it richer)
That you can take something too far in one direction doesn't give reason or explanation for anything except "don't take it too far in that direction". There are probably good explanations why not to take things infinitely far in either direction but they don't really help explain where the line should be drawn.
I think the actual argument here is encoding of seemingly arbitrary colored pictorial combinations overcomplicates character encoding. If you want to display a colored drawing of an arbitrary family SVG is already a thing but if you want to textually encode an arbitrary family you should use characters in your language not expect the text encoding to pick up more arbitrary drawings.
To me, using Emoji was probably a great way to force developer's hands on supporting certain encoding features more complicated languages use even in cases they only wanted to support latin text. That said, this job is already done. We don't need to continue putting everything into ever more complex pictorial encodings via Unicode for the rest of time.
> That you can take something too far in one direction doesn't give reason or explanation for anything except "don't take it too far in that direction".
Why or how does the Tahitian language take things too far?
> How do you propose that Unicode include glyph innovations that are digital-first?
Why does Unicode specifically need to include novel "innovations"?
It's arguably failure of the community that we have not been able to standardize interoperable higher level rich text formats, so now everything and kitchen sink needs to be bolted on Unicode instead in the name of interop.
> Why does Unicode specifically need to include novel "innovations"?
Unicode chose to get into emoji because Japanese carriers were already making their own private character sets with emoji, and Unicode has a goal of being a superset of all other (relevant) character sets. (Arguably emoji support in Unicode has been a resounding success. Look at the world-wide enthusiasm around each new Unicode version that gets announced now!)
It also brought broken handling of code points beyond the BMP to the forefront and dealing with Unicode text is correct in much more places by now. Previously the only people who noticed were those who needed obscure Han ideagraphs, hieroglyphs, and other things that are not terribly useful or interesting to the vast majority of people.
> How do you scribble "family of three: mom, father, daughter" without any emojis?
How often do people need to do that; with that level of fidelity, right down to the composition of the family? My suspicion is not often. A short sentence does this description just as good and is much less ambiguous, as in written, concise communication that is "emoji-rich" is filled with so many details (such as the composition of the family and their skintone) it's often not clear which details are important for the message and which are not. In addition because emojis are rendered differently on different devices the meaning may be lost (e.g. when apple changed the gun emoji to a watergun, many messages took on different meanings depending on if a revolver was used or a watergun)
Surprisingly, this happens. I recently started to use a lot of emojis, including ones like these, to name calendar events - because I use a watch face on my smartwatch that renders the next 12 hours worth of events on the clock face, and given the small space, many of the events can fit three or four letters of description. Emojis work as great workaround, because I can encode things like "takeoff, gate A11" in 4 visual characters, or "doctor's visit" in one.
(I prefix the event titles with emojis rather than replace the longer form completely, because some of those events are shared, and sometimes I forget what an emoji stands for anyway...)
Now, this is perhaps an unique use case, but I found myself doing this in other scenarios too, like task planners - the common thread is, "not enough space to fit full label".
A set of hieroglyphs for a formally designed and recognized language that is in use to communicate is one thing. Endless bits of arbitrary clip art is another.
I think the emoji combining isn't that useful, at least in the family example. It is enough putting them side by side without merging them. That way you have even more possible combinations, like a family of an ogre and a human and a kid.
I don't speak good emoji though; I understand the smiley face, but what does posting the family emojis mean? What is it used for?
Nobody I know or have seen online replaces random words with emoji for brevity. That's annoying. At worst I've seen ironic posts or trying to be hip outlets adding the emoji for a word after the word itself, or use them as bullet point markers.
It reminds me of a thing where it was like, "elephants are so smart because they can use this trumpet to communicate danger", and someone pointed out that humans can do that too by using words. Mind blown.
See, we invented this thing called the alphabet. You memorize some 20 symbols and combine them to express every possible emotion, instead of memorizing a gazillion symbols.
It’s no wonder hieroglyphs have fallen out of favour in the last 5k years.
How do you propose they get transmitted? In-band in the middle of text (good luck reading that with an unsupported editor), or through servers, necessitating NAT and all that good fun?
Since the images can be arbitrary you also open yourself up to bugs in the parser.
All around a much, much worse idea compared to text emojis.
> You need a "supported editor" to display Unicode emojis too, as TFA shows.
Let's give this a try. This is roughly what a sentence with an emoji looks like in an unsupported editor:
hello [] world
(with [] being the well-known "placeholder" square). Now let's try the same with an emoji, I went with 16x16 to make this kind of bearable, but realistically you'd want at least 64x64:
hello data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wAARCAAQABADAREAAhEBAxEB/8QAFwAAAwEAAAAAAAAAAAAAAAAAAAIICv/EABwQAAIDAQEBAQAAAAAAAAAAAAQFAgMGBwEIFP/EABoBAAEFAQAAAAAAAAAAAAAAAAACAwUGCAn/xAAhEQACAwEBAAICAwAAAAAAAAADBAECBQYHCBITFAAVFv/aAAwDAQACEQMRAD8A1xbv6+xizYBjaeOtvSMunT5XmVeTDaMAP1waXJyNft7FFlPoub9OHjK5gylJQmBICnfTMgi+6PJ31n5YubHpnY4S5H1eV89d3MumSj0TmCPdLhNmR0dDQNmtpM65HTrHjKxrGlSiY4JdczZL3ppUPljuLz2OdUaRNLYxkt119xdcp1qvKjcXzcyGqFquRYJq1YKGsNMNQSsFqCg6SYX6+xjPYGDZivWjpVnTYcr06vWBtFy/0qbSlPRsMTNtbb6VnPDSJSoYrfYqHIQxsqKIkUUX+nlHyybyPS+OwjkeZ5b0F3DzL5LvQubwsMu44FFB/PNotus45UjtB/tMehoUIpeb1XC0OhJDeWu7XPa5mqJD0cjGd3En011xHZ/RVK4fN04VGKpyMADaoSHiWlmfxRJLAuQf8hP6/wCB/SXGOkaDZ8hzrbo/P9m4cPaglrLqYUMzY1bMHRKs2rm7sSauQJTQgcNqyXkL2CyC+uZwRYNw5E5634bPnnYdjtF8WzPYuM7vo9DrQPL4mrodHyWlpkufUyT2wzhfpi2Nf9hMn1MvWZvFq0Na/wBrpyHYZHoGDgotdn/juj5vJTwzpuuZS+duLJCoBTRWtrhuuV2wxxRoNCCNF5mYggvrML8fcA+kuydHRbPryFxz3BZFumdXAs2PUzB9D4qbrnlCoCrpLwyxrewKVjCmN1q8ZetVTY0wNNKYVD0NeTeF39G7Pjdsfi+Z45xXBdCj1bDh8TUz+i67TziiPl5Qp3DnfJjjMKTtkn8K8zMfWtzVH9V9h2GN59z+8ir2Udl0nSZTeKuok7lsZ+Eu2Ei7WgxbICNcLlRkmq4bXIebxWZgYvvaf//Z world
How is that any better? You can’t integrate them easily in the text, you still have to store the images and send them around, and there’s no way of normalising them, which means that this has all the inconvenience of Unicode without any of the features that help handling code points and glyphs, and even smaller probability of it being rendered correctly.
Honestly I'm happy enough that emoji are there as a carrot to upgrade their platform support for newer unicode revisions which the US otherwise might not care about. It's only been 10 years since you couldn't reliably use the € symbol in web forms for example. Never mind things even further afield from the US
Unicode is only curating the clip-art collection, inventing brand-new entries is left to interested third parties.
So a closer analogy would be the Oxford English Dictionary adding newly-coined words, which—of course—they do https://www.oed.com/discover/the-oed-september-2021-update/ in addition to expanding coverage of older words (analogous to the Unicode Consortium working on historical scripts).
NB if you don't like multi-color emoji, you can also use a monochrome font for them.
It’s not the same thing because Unicode is the only gatekeeper for emoji adoption. You can’t use an emoji that isn’t yet in the standard and deployed by the OS vendors. Whereas OED only adds words that have substantial real world usage.
Re: monochrome emoji rendering — an impossible proposition if you need to render any user-generated text. People simply can’t understand that their emoji might look different than when they picked it on their iPhone keyboard. The supposed rendering latitude on emojis is completely imaginary; in practice you need to get an emoji set that’s as close as possible to Apple’s without infringing on their design copyrights.
> You can’t use an emoji that isn’t yet in the standard and deployed by the OS vendors.
You can, as long as you control the font. Pick any codepoint in the Private Use Area and have your font define a picture for it. That's the whole idea behind icon fonts.
> People simply can’t understand that their emoji might look different than when they picked it on their iPhone keyboard.
Android users who pick an emoji on their Google keyboard and then have something different show up in the message they sent seem to be able to cope somehow.
But if you can't get away from imitating the Apple look because of wrong users, you could still try converting it to monochrome. Maybe users will forgive the deviation if it makes sense in context, e.g. for a terminal emulator.
The analogy is that Oxford Dictionary is adding words, BUT ... you can only use those words and your words will be automatically replaced by synonyms on different platforms.
My main problem with emoji is that they look different everywhere. For example "big grin" looks like that on native Android, but looks like an angry face on other platforms. Quite a difference in meaning.
Another problem is that some emojis are horribly missing. For example during COVID it would have been nice to have a cotton swab thing.
> For example during COVID it would have been nice to have a cotton swab thing
Why? It would be next to meaningless now but have to be supported forever. This actually exemplifies why they need to be more strict about what goes in.
Funny how alphabetic text allows you to write whatever by just combining ~30 symbols in different order and quantities but for hieroglyphics you need thousands of symbols and it still is not enough, and never will be.
Yeah but... why? So social media outlets can make coming in for a test c00l and hip and down with the kids? Does a swab emoji improve the message or is it just colorful seasoning?
The Unicode Consortium's stated direction is that they want protocols to support stickers (arbitrary images) instead of emoji. But protocols don't seem to be doing that, so…
The whole emoji phenomenon is a kind of infantilizing cultural rot -- it makes serious, static documentation and tooling resemble a children's book and hinders live communication by encouraging vague single-pictogram messages and the expression of raw emotions (genuine or not) instead of mature and balanced thoughts.
None of these things are mutually exclusive the way you're implying. This is a "cultural degeneracy" argument, which are always suspect imo. You are certainly entitled to dislike the aesthetics of shifts in communication, but you're basically just assuming that the changes are inherently negative and there's no reason to think they are.
"Single-pictogram messages and expression of raw emotion" are simply not mutually exclusive with mature, well-considered, intentional communication.
I never have understood the push to emojify everything. But commit messages, options in menus, status outputs, you name it: I think we'd be better off leaving it in plain ASCII (or Unicode if your language needs that, just as long as you don't dip into the emoji).
Emoji are quite nice for messaging people and occasionally some of the more generic ones can be useful for status indicators (think the red and green circles), but I don't want to be presented with a little green worm every time somebody submits a bug fix on GitHub.
I think emojis in commit messages are very overrated.
But emojis are great in informal communication when you want to compensate for the fact that your words are completely divorced from inflection, tone, and body language.
Absolutely, emoji used spontaneously are great. When they are used as qualifiers or attributes constantly, it becomes just as bad as typing "feat:" or "chore:" or something similar in front of every commit message.
> I think we'd be better off leaving it in plain ASCII (or Unicode if your language needs that, just as long as you don't dip into the emoji).
Why such a distinction? Symbols are very useful in commit messages because they can convey a message at a glance that would otherwise require parsing the message. A big, visible symbol for bug fixes and another one for new features makes a lot of sense. Once we’re there, how is it better to use things like Chinese characters rather than emojis?
Making emojis work everywhere also has the very useful side effect that the rest of Unicode works as well. Which is kind of important for the vast majority of the people on earth who need more than ASCII to write in their mother tongue. On balance, it is better that they are supported properly even if you personally dislike them.
Not even considering the moral aspects of wanting to impose on other a way of communicating.
> Symbols are very useful in commit messages because they can convey a message at a glance that would otherwise require parsing the message.
A good commit message doesn't need symbols to tell you what it is doing. (And no, I don't write perfect commit messages.)
> Once we’re there, how is it better to use things like Chinese characters rather than emojis?
Because there are quite a lot of people in this world who speak Chinese but not English, while nobody speaks only emoji.
> On balance, it is better that they are supported properly even if you personally dislike them.
To be clear, I'm not against emoji or Unicode in general. I just dislike the way they seem to be showing up in commit messages and the like as attributes.
> Because there are quite a lot of people in this world who speak Chinese but not English, while nobody speaks only emoji.
And even more people using emojis. Fundamentally, as a native speaker of a European language, there is no meaningful distinction between using fancy image characters and fancy characters that look like symbols. The fact that someone somewhere is using them or not is not very relevant.
> To be clear, I'm not against emoji or Unicode in general. I just dislike the way they seem to be showing up in commit messages and the like as attributes.
Right, but then that’s something you solve by policing the commit messages in your projects. Not by policing how other people you’ll never meet express their ideas or emotions.
I'm on the VS Code team and maintain xterm.js which is what Hyper's frontend is based on. There are actually multiple developments happening in this area.
First, there's a contribution from the author of DomTerm which adds grapheme cluster support to xterm.js, which will correctly merge and size things like emoji that are called out in the post. This is currently based on Unicode 15. See https://github.com/xtermjs/xterm.js/pull/4519
Second, while Windows Terminal does seem to work with emoji sometimes, it doesn't all the time. I'm not 100% sure, but I think it may only work on Windows ptys, not in WSL for example. Last time I spoke with the team they said they're working on a rewrite which could lead to proper emoji support.
I'm the author of DomTerm and the above-mentioned xterm.js PR. Both use the full UnicodeGrapheme Cluster Boundaries algorithm (https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundarie...). However, I haven't seen any specifications for how wide the resulting clusters should be in a mono-space context. So unless we enhance terminals to handle variable-width fonts (which I've been thinking about), we need to took at other terminals and make judgement calls. Generally, the width of a cluster is the width of the widest codepoint. Also, I decided that a "normal" character followed by emoji-presentation makes it 2 columns wide.
In your linked-to article, you suggest 2-em dash and 3-em dash should be 3 and 4 columns respectively. That might be reasonable, but it is explicitly contrary to the EastAsianWidths specification. You also suggest that Pictographics fullowed by text-presentation should have width 1. That seems reasonable, though I don't implement that.
To my knowledge, there is no specification for monospace width. Applying the East Asian Width attribute as a proxy does not work reliably. So in my opinion, it is all a matter of agreeing on what's reasonable while keeping it simple. I.e. if allocating text-presentation Pictographics to a width of 1 is reasonable, that's what renderers should do. It's what users would expect.
I spent quite a bit of time coming up with an algorithm that gives me the correct width for any Unicode character, including emojis [1]. The reality is much more complicated than what wcwidth does and using the East Asian Width attribute will give wrong results, as outlined in the article.
Unfortunately, since a lot of terminals simply use wcwidth, you often have to use that same flawed algorithm in your application to make it work in those terminals.
While this is a real issue, that effects terminals in real-world use, I took issue with this quote:
> This issue affects every terminal I've tested: Visual Studio Code, iTerm2, Alacritty, and Hyper.
I don't think those terminals are particularly known for being feature-complete with regards to VT/ANSI codes or new developments such as emoji. The quantity of things a VT-ish terminal must do is massive and not standardized, so many terminals just cover the common cases and nobody notices that they're incomplete.
If the author noted that emojis don't work properly in xterm (not xterm.js) or konsole, then that'd be something to write home about.
For reference, I've written a terminal emulator fairly recently, and found xterm to be most faithful and also a great source of documentation in itself. In personal use though, konsole is pleasant and seems to do everything I've seen right.
On the other hand, currently writing this from mac and iterm2 is kind of terrible, send help.
Microsoft Terminal is actually pretty good. I was surprised when I configured git bash as my main shell there and when opening it via right-click in Explorer, it correctly translated the path and opened the shell in the correct folder.
I didn't like Windows Terminal when I used it over the last few months at the last $job because my choices of software were limited and it was better than the Windows Console.
There seems to be no way to make the cursor clear and visible on all background colours that I used for CLI and text editing. I ended up having to use a difficult-to-see 50% grey cursor, as a compromise that was better than losing the cursor completely sometimes.
It wouldn't send Control-Space or Control-@ from the keyboard (they are the same character, ^@ aka NUL), which is the key used to set the editing mark in Emacs so used a lot. No key combination is mapped to that character in Windows Terminal, and I couldn't find a way to customise it to do so either. Historical real terminals like the VT100, and of course xterm etc, always did so, which is why it's an essential key in some terminal applications. I compromised by binding Control-] to set-mark-command in Emacs but this was annoying when switching between devices.
Finally, line drawing characters, boxes, progress bar characters and such had ugly gaps, as though line height was incorrect for them.
All of these seem to work fine in every other terminal emulator I've used.
I ran a few tests in Windows Terminal. The bomb emoji got width 2, while the motorboat got width 1 and correct aspect ratio, though I didn't quite get to see it properly until I zoomed in like 5x. The family was rendered all as one emoji, cells wide, but left 4 blank cells before it.
So it was a bit better than the authors tests, but there's still room for improvement.
Honestly, seems like the problem with wcwidth's implementations. Most of them have been reporting almost all emojis as having width 1 (and some very non-invisible symbols as having with 0) for years, to this very day, for some unexplained reason despite regular web rants of how this is wrong. Check out this one: [0]. It's 8 years old! The problems are still there.
I cannot even begin to fathom how you would of about fixing this more broadly. I’m sure there are many who have come to count on the current behavior for some critical use case, and would vehemently oppose any attempts to change of correct. Hyrum’s law, etc.
Is it windows or is it golang making the difference there? Golang's implementation of wcwidth is different, and a commonly used lib has an 'emoji' table which covers that bomb (1F4A3)
Are there some well defined Unicode subsets, e.g. one without Emoji? (I'm only vaguely aware of XML specifying "Unicode without control characters" in the beginning).
Haaa yes the eternal emoji width problem when using monospaced fonts…
I added an explicit workaround for this in my logger lib, with an environment variable to be able to change the behavior.
https://github.com/xcode-actions/clt-logger/blob/12a5ebc1b00...
I think it is because those are unicode points that can be displayed as either emoji or text. (e.g., https://www.compart.com/en/unicode/U+262F). so when displayed as text they are N
I don't know of a single terminal that actually gets these ZWJ emoji sequences right. There are several terminals who will render them, but I haven't seen any actually get the width calculations correct. (E.g., https://github.com/kovidgoyal/kitty/issues/3810.) Part of the problem is that it really messes up the idea of a rectangular grid of base characters in memory, so you'll need some sort of indirection. Obviously I haven't tested all terminal emulators in existence, though :-)
I found wide character support to be surprisingly good across the terminal emulators I tried on macOS and Linux.
This testing was done earlier this year when I was building support for Unicode variable names into my $SHELL (more to support foreign languages than glyphs but the end result is the same)
It’s worth noting that wider character support needs to be implemented in both the terminal emulator AND and the console applications that run on the terminal emulator too.
I cannot claim to have tested every ZWJ character on every terminal emulator but if you find any ZWJ related bugs in murex shell https://GitHub.com/lmorg/murex then it will be treated as a serious bug because Murex does have international users so aims for greater compatibility with other language writing systems (albeit the documentation is presently only available in English).
IMO we should have left emojis out of everything that is rendered by convention with monospaced fonts. But of course I respect the desire for displaying rainbow farting unicorns even in consoles, so let's find a solution for this.
It’s like the Oxford English Dictionary decided that they’re actually poets; their main job is suddenly to invent brand new words that let people write with an exciting level of density and poetic license; and those new dictionary words would also be multi-color because everybody owns a pack of colored pencils, right.