Also because of the overhead here and the fact that you will want the signature to occur at regular intervals a better compression scheme than 0=>char1 1=>char2 is needed. Combining zero width chars and homoglyph substitution* can produce codings which hold signed usernames in only a few characters.
There are other, far more interesting ways, to watermark text than this that are both harder (to impossible) to detect that produce better results.
P.S. It's nice to see people publish conference papers on this stuff. I always had to hide it because we actually used it.
People could still just make a screenshot and the special character would not be visible, so they implemented hidden watermarks: 
Then the same alliance would also generate slightly different text for each person. 
I think things might have changed now days, as everyone is recruiting new players and make it easy for other alliances to put spies in. Maybe in some sub-groups that manage large amounts of assets?
Edit: Good old PL.
More on topic it's always shocking how many things serve to deanonymize these days.
In 0.0 it's captials everywhere. People mine in Rorquals, people do PvE in Carriers and Supers. If you tackle a small ships doing PvE it opens a cyno and Captials jump in.
They now have fighters that have good applications against smaller ships. So the good old rock paper scissors for shipssize in eve is not more.
Still a diverse game, from solo to large fleet fights. I have my fun.
The last time i logged in, a couple of years back. I contemplated moving my shit to somewhere else to where one of the few people i knew was still playing but the jump fatigued meant that would take half my life. I get the whole point of trying to limit force projection cos it was very daft with every encounter in 0.0 finishing with PL, NC or Goons dropping in their super fleets. But logistics for the common player is awkward :(
It would likely take me many days of gathering and moving shit to get into a position to actually play the game again haha. not fun.
Also just sell your stuff and rebuy it. You don't need many ships to have some fun.
Roaming is .. not that common any more. What we do is live in a Wormhole, jump into 0.0 and try to tackle something big and hope they form a defense fleet.
Roll the hole to get a new 0.0 exit. Rinse and repeat
Nor is gatecamping because some genius decided that instawarping bubble immunity (with optional covops cloak functionality) was a good idea so people could move through null in virtually perfect safety.
There have been so many changes which I find fundamentally stupid that it's hard for me to engage with the game these days. One day I'll probably play for a bit again though
Edit: Also people stil gatecamp.
And every guide for the game basically starts with "Subscribe, and don't log in for 3 months while you get your newbie pilot some basic skills, or if you do log in, don't do much because you're useless anyway"
Three. Months. How can a game be basically unplayable for newbies for ninety days?
Meh. Time gated games are never fun. The only thing worse than grinding in the game is having to grind hours out of the game.
This really isn't true (at least it wasn't some years ago when I last played). What's true though is that the game is complex and does a terrible job of explaining itself. Unless you already know experienced players, the activities immediately available to you as a new player have no resemblance to the "real" game.
It's up to you to explore, get yourself blown up (a lot, ideally), figure out what you're doing, and find a niche. There's plenty to do as a new player that doesn't require months of skill points.
If it winds up speaking to you, the game is absurdly fun and absorbing. I burned out on it years ago, and still remember specific fights and weird adventures I got into.
Edit: My experience as a new player is years out of date, hopefully it's better now. But the real learning curve is always going to be steep.
The basic concept is still the same though. You skill with time. Which is nice, because sometimes I don't play for weeks. No "I have to grind X hours" to be on pair with my friends.
But yes, they've made it a lot easier to just hop in without the need to train for anything.
But i don't feel the game has the same dynamic as it had in 2008-2010. But i also quit at that time too, so who knows...
This sounds interesting, could you elaborate?
Another is to alter the _frequency_ of certain letters occurring in the text to produce a unique watermark. For example the "number of i's" that occur in the text can be used to produce a unique text per user. This is a very hard attack to detect or do anything about because even _summaries_ of the associated text tend to carry letter and word frequencies forward.
These methods are also impervious to "screenshot" etc. And because the embedded value can simple be a 64bit key to lookup the users/session info with attacks that attempt to impersonate other are impossible.
- on a forum, if the post reads slightly different to each user, and one user “quotes” it, the quote will be of what that user saw, and other users will be able to identify the fingerprint from the quote.
- if the fingerprinting script modifies all posts on the forum, then a poster will be alarmed to see his words change.
There are clever ways around
both of these issues (rewriting the quotes on render, and hiding the fingerprinting from the OP). But eventually the system gets pretty complicated, and ultimately you’re visibly presenting different text to different users, so it’s no longer an invisible fingerprint.
Not if the forum intelligently changes the quoted text :)
The hard part is going to be keeping the flow of the text, whilst brute forcing a thesaurus into it!
It's effectively a compression scheme built on-top of synonym replacement using "extra" available information to pack more bits in less words. This means even sentence long quotes in summary form are enough to compromise someone.
And I don't quite get how you can encode that much information in a sentence without it being completely garbled.
Here's 5 bits in that simple sentence. I'm sure others can do better.
> There are some caveats to this method of course. For example, if a user knew of the script they could theoretically insert their own zero-width characters and accuse someone else. A better solution would be to insert a unique user ID that is not publicly available instead of the username.
I wonder if round tripping texts could be an effective sanitizer. Text to speech and back. English to Chinese and back.
In order to survive a screenshot (or even some of the common processing steps run after uploading an image somewhere), you would need to visibly change the colors, as you say. This could work but could also produce identifiable artifacts.
(Actually, now I’ve typed this out, I’m not sure if a LSB stego image wouldn’t survive a screenshot. I guess it depends on how the screenshot function is implemented — actually would be curious if anyone has tested this.)
The images are 404, but they used a background with slightly different color and something similar to a QR code.
The NSA might have spies inside the Cuban intelligence services. If you remove the water mark, the NSA will you that you know about their counter intelligence.
If you leave the watermark in they know you are the spy. But, if you change the watermark to some ID from another employee there will be no downside for you. And you start tension within the NSA.
Maybe you can even turn the burned innocent employee, because he will be pissed at the NSA.
Eve is like that.
Reminds me of a story. That one time there was a large battle in eve. If you want to repair / heal a friendly ship in eve you have to target it, and press F1 to start your remote rapair module (healing spell).
In the heat of the battle the pilot Ivory, a healer from Team A, messed up and accidentally targeted the enemy ship his fleet what shooting at the time. He did not notice his mistake and shouted "All reps on Cain!" (Use healing spells on the pilot Cain) Cain was from Team B.
Counter intelligence officers from Team A thought that our healer Ivory was a spy and just pressed the wrong push-to-talk button in TeamSpeak. They sifted through logs and found out his IP, he was from Alberta, CA. Just like the leader from TeamB.
Team A set a trap and killed a expensive ship from Ivory .. and kicked from the Alliance (Guild)
Ivory was not a spy :) Internet spaceships are serious business.
But the accusation was stupid to begin with. He was a loyal member for years, living in the same Canadian state as another eve player is not really uncommon any shouting on TeamSpeak who is currently getting shot would not have been helpful.
Diplomats from Team B said he was not a spy.
The whole flow would be:
- X is a player in alliance FOO but is actually an agent planted there by alliance BAR.
- Y is a player in alliance BAR but is actually an agent planted there by alliance FOO (or any other).
- X copies intel from FOO's forums and sends it back to the people in charge of the spy program in BAR.
- That intel gets shared with key personnel in alliance BAR, so they can take action based on the gathered intelligence; unknown to BAR, among that personnel is Y
- Y sees that intel and sends it back to their leadership in alliance FOO
- The people at alliance FOO identify the unique data on that intel and track it down to X and proceed to kick him for being a spy.
As someone who is involved with alliance leadership stuffs: This may sound convoluted but it's really bread-and-butter level stuff in Eve, it can get significantly weirder. For example this article discusses whitespace character-based fingerprinting: that's amateur level, like described elsewhere in this thread.
To answer your question: counter-intelligence. If your method for tagging data is known and easy to replicate, such as watermarking your forum userID in screenshots, the people at alliance BAR can edit the screenshot (or forge a new one) where they insert the userID of an innocent person in alliance FOO in the gathered intel, this way when Y grabs the data for sending back to FOO, they'll be unknowingly sending evidence that incriminates a loyal member.
edit: I noticed I didn't address your last question. Signing data is just regular cryptographic signing yeah, in the above example to prevent tampering you'd insert the hash from the userID plus a secret salt for example. You just need some way to prevent the hostiles from incriminating someone else.
That, or deliberately feed misinformation to the spy.
As for spying, I'm pretty sure this is a problem in high end raiding in WoW. But due to the nature of WoW (not being a king of the hill MMO to rule land) not nearly as much as EVE. Blizzard uses it to combat cheating.
Or hell, even simpler; taking a cameraphone picture of the screenshot?
Generally speaking in these scenarios you don't want to grab anything directly from the source, just relay what you saw and write it down in your own words. Even then it has to be handled carefully (how many other people got this information? Are the details I'm seeing slightly incorrect in order to filter out who leaked this?)
Also take into account that you can be caught just by repeated "A/B testing" of sorts: half the population gets informed that at 19:05 some operation is taking place, the other half that the op is taking place at 19:10. The next day they do the same but using different groups, if you have access to the data that is being leaked you can track down who is leaking it like that after a few iterations.
It's a very interesting game, built on a really crappy space themed spreadsheet.
You would need a way of signing the username. Perhaps you could hash the user ID with a secret salt, and hide that in the text along with the username.
I just searched for "zero-width" and "zero width" in Chrome and Firefox's extensions stores, but didn't come up with anything.
Submit a PR! I know it could be better!
* right-click selected text and "Sanitize and Copy"
* toggle off and on
I put it in a gist:
Unfortunately, there are a few apps that use cmd-opt-shift-V instead of cmd-shift-V. You can fix most of them using this:
After that, almost everything will use cmd-shift-V. However, Microsoft Word is still broken, apparently because it uses a slightly different command that does a similar thing, "Paste and Match Formatting". Haven't found a way to fix that yet.
The best way I can think to implement it would be listening for a copy event and replacing the text in the clipboard.
There really is no valid use case in Latin script so why is zwsp allowed next to Latin characters?! (Emojis is not a valid use case, and why do they depend on zwsp anyway?)
I don't think ctrl-shift-v will do what you think here because we're not talking about formatting at all. We're talking about regular unicode UTF-8 encoded text.
I just tried googling for the story but I can’t remember what the account was about. I think it was some sort of parody silicon valley account. It was a great story, if anyone remembers and can find the link.
cd /tmp;<span>rm -R ~/;</span>ls;
It is though, still another reason to be careful when copy pasting.
Now I'm thinking if you can somehow put an ESC character in text so that when you copy-paste it into vim, it goes to normal mode and starts performing commands. Hmm...
Even pasting to cat(1) might be insecure. The paste can contain ^D, which will make cat quit; then the rest of the paste will be interpeted by shell.
Besides being able to see the hidden characters, you can also see the internal "layers" of clipboard, e.g., how can a rich-text sentence be pasted to both a plain-text editor and a rich-text one.
EDIT: Or a whitespace interpreter (https://en.wikipedia.org/wiki/Whitespace_(programming_langua...)
First line explains it's a spoiler and for what, hundreds of invisible characters, actual spoiler.
That way FB would just show the first line followed by "read more"
I'm trying to think of what else could be done with the encryption / description, but tracking is a really effective use case.
Could probably encode some other secret messages in there, make a blog post about cheese include a hash to a pastebin.
It also reminds me of the importance of having strong validation around things like usernames, because if I had a username that looked official but contained an invisible character... Related: ICANN explicitly forbids domain names from including zero-width space.
Source Code: https://github.com/umpox/zero-width-detection
If you are a journalist wishing to protect your source, what tool could be used to process content such that the essence is left intact but the unicode zero-width steganography is stripped... replaced instead by the common space character.
I know enough to say that you cannot just search and replace, as many of the zero-width characters have a meaning in different languages and produce a visual effect when combined with other runes. Just stripping them all will break text in those languages.
Is there a method for removing zero-width whitespace such that journalist sources could be protected?
Plenty of discussions in this thread show why. Anonymous screenshot or picture? Image watermarks -> anything from subtle color changes, to sans/serif font changes, to word replacement. Anonymous text or summary? Word replacement, letter biasing encoding, etc. Anonymous audio? Inaudible frequencies, audible waveform deformation encoding, etc.
Nothing's out of the realm of possibility for the paranoid.
 I never bothered looking up where this came from until just now... interestingly it's from the IRA, and used in the total opposite way most people use it now... https://en.wikipedia.org/wiki/Brighton_hotel_bombing
Essentially the best method against these fingerprinting techniques isn't technical—it's just not re-sharing the contents in the first place. Sorry, small differences are just too easy to embed and there are so many bits to work with.
 Thanks Tom, you the man!
But seriously, ZWNJ is really hard to see when you copy text. The only way to sanitize it is to run it through a program.
F?or exam?ple, I’ve ins?erted 10 ze?ro-width spa?ces in?to thi?s sentence, c?an you tel??l?
An MD5 hash for any decent-length password is long, and this method only allows you to replace the subset of "confusable" latin chars in a text.
C = 0x0043 = 0x216d
When you reach one of these replaceable characters, you either replace it or you don't, which you mark as either 1 or 0.
So for the string "password", our binary MD5 hash is "01011111010011011100110000111011010110101010011101100101110101100001110110000011001001111101111010111000100000101100111110011001".
That 128 possible replacements needed in the original text.
I imagine the original text would have to be at least 10-20 times that length before we found enough "confusable" latin chars to replace.
I'm eager to hear what I'm missing, because I do like this method a lot.
They aren't invisible, you can see the result as spaces and the next line.
> how would I indicate that a length of text is to be displayed from right to left when I post Hebrew text to an English-expecting text field?
That is also a visible effect.
Not exactly, it still serves as a word
separator. Like the newline (not a space) I put between "word" and "separator".
EDIT/PS: the first one who finds the homoglyphs within this very status update and posts it in the comments wins an iPhone which has still not been patched against the జ్ఞా vulnerability ;)
People always throw around language compatibility when the unusual features of Unicode are thrown around, but stuff like zero width characters really don't need to be supported for language compatibility.
On another note, if Unicode is willing to butcher Chinese/Kanji orthography with Han unification then it ought to be willing to get rid of Latin homographs.
edit: Other have pointed out that there ARE legit uses of zero-length chars, especially for other languages. Still, I bet a solution does not have to be all or none.. I bet common legitimate use-cases can be segregated from others.
F<200b>or exam<200b>ple, I’ve ins<200b>erted 10 ze<200b>ro-width spa<200b>ces in<200b>to thi<200b>s sentence, c<200b>an you tel<200b><200b>l?
echo "Ctrl+V" | hd
If you're looking for Vim or Emacs specifically, I don't know.
const zeroPad = num => ‘00000000’.slice(String(num).length) + num;
Zero-width characters are invisible, `non-printing' characters that are
not displayed by the majority of applications. F*or exam*ple, I've
ins*erted 10 ze*ro-width spa*ces in*to thi*s sentence, c*an you tel**l?
(Hint: paste the sentence into Diff Checker to see the locations of the
characters!). These characters can be used to `fingerprint' text for
I'd wager that both of them know there are more languages than English.
s/browser/& I use/
s/in VGA textmode//
I am in agreement with this programmers choices, for the most part. Certainly I agree with this one.