Here is some information about romanisation of Cantonese if you are interested:
Romanisation system for Cantonese has an interesting history! Yale romanisation system  is (IMO) the most readable and also later on refined as Jyutping , another method used in more academic contexts which IMO is less readable (both used in GBoard as Cantonese input methods). However most persons and place names in HK use older system  developed in 1880s by Christian missionaries.
When people use Cantonese romanisation as part of their casual text chats on instant messaging or social media platforms, it’s usually a mix of both systems [1, 3], but rarely  but without the tone information (so lots of many-to-one mappings), mixed in with bits of English, making it hard to understand (even for a local Hong Kong person) without having good prior context of the entire conversation.
I know Google is actively working on the Cantonese version of Google Assistant, though not sure when it'll be officially released.
Source: I'm a native speaker of one and fully fluent in the other.
and if written Cantonese is mostly informal (conversation, shop signs) it will not often be multilingual. so the approach that has worked for most languages wouldn't work then.
and it surely wouldn't work for a completely different, lossy orthography - without independent training.
It’s like the interesting fact there are more Mongolians in China than in Mongolia.
This isn't a secret code or anything; it's a standard romanization that almost everyone who learns Cantonese formally will learn. The thing is---formal education in Cantonese and other non-Mandarin Chinese languages is banned in schools in China. Mainlanders that speak Cantonese as a primary language often don't even know how to use it (I talked to a Cantonese-speaking girl from Zhuhai, and she was like "Can you show me what Jyutping looks like?" Bizarre).
It's pretty smart, and a bit of a slap in the face to the establishment, which has been forcing Mandarin down people's throats for the past 50 years or so.
But translate.google.cn is not banned on the mainland and Google services work very finely in Hong Kong.
And forcing Mandarin does people's throats is not all bad, in terms of literacy and considering that 70 years ago most of the country was illiterate, no only in language but also in ideas, such as basic western ideas in medical - which led to a huge reduction in infant mortality - the doctors and the literate going to the countryside.
And the across the sea, river, passing swamp, was Hong Kong, which figured out very much earlier and flourished.
But.. translate.google.cn is not blocked in China.
US equivalent to this would be forcing English (which we do) while jailing anyone who teaches Creole (which we don’t).
Mandarin being the primary language of education doesn't mean that Cantonese is prohibited; it's just not mandatory, so most native Cantonese speakers aren't going to get a formal education in it unless they specifically seek it out.
Outside Hong Kong, Jyutping seems pretty much universal.
Is everybody really typing with Cangjie/handwriting? xD
For day to day transliterations of names etc there's actually no standard, just some loose rules based on English pronunciation.
As an aside this has caused colloquial Cantonese prounciation to "shift" over the years , it's called "lazy tone". E.g. 你 (you), the proper way is 'nei5', but most younger people say it 'lei5'. It's a big debate whether this trend ought to be stopped or not.
The step that's missing for me is how to turn the Romanisation into Chinese characters.
The examples in the article miss the tones. It's like writing pinyin without tones (even with tones it's much less clear than using characters).
This means that understanding a sentence requires a good knowledge of the oral language and to read it all to extract the meaning from the overall context.
That seems to be the point: Software tools will be completely lost.
Pace the obvious and awesome power of the Chinese surveillance state, nothing is foregone.
Edit: it's true that you could eventually translate the literal stuff back into Chinese characters, I'm sure, but that doesn't mean things are predetermined in a wider sense.
Not dissimilar to calling everyone that disagrees with you a Russian bot, which was very prevalent in the states for a few years
Sentiment extraction, semantic meaning extraction, categorization... these are all really hard problems (to do automatically) even on properly spelled and grammatically correct text. I would imagine they are even harder in Chinese, which as I understand it has several different writing systems.
The HK protesters are clearly quite clever. If they keep using different obfuscation schemes for text, I could see it forcing the mainland to use human beings to read every post. Which I'm sure they have the resources to do, but it's still more expensive than using a machine.
Some strategies I would expect to be effective:
* Using alternative phonetic encoding (i.e. what is shown in the article, using Latin letters to spell out sounds rather than words)
* Homoglyph attacks
* Using deliberately incorrect or ambiguous grammatical structure
* Using deliberately incorrect spacing and punctuation (for example "m ee t. me;? b!y th e do.,c?s a!.t; m id ni;ght" will completely bewilder all the parsing packages I'm aware of)
* Convert the text to images and post those, possibly adding graphical text which will confuse OCR packages
Mix and match for even more fun!
There are also lots and lots of stenographic techniques, but those are a lot less accessible to laypeople.
I'm not familiar with NLP tools and techniques available for Chinese, but most parsers/taggers for English aren't really written with adversarial inputs in mind. It would probably be possible to deliberately construct valid (or at least decipherable to a human) English text that would crash the common tools available.
As an aside, the articles that keep coming out over the HK protester's tactics are starting to seem a lot like Cory Doctorow's "Little Brother", which is available for free, and definitely worth a read.
1 - https://craphound.com/littlebrother/Cory_Doctorow_-_Little_B...