
Show HN: Send Secret Messages over Twitter as Public Tweets - dpapathanasiou
https://github.com/dpapathanasiou/tweet-secret
======
brey
Steganography conceals the existence of the message, not just the contents.

if David Miranda gets stopped at Heathrow and:

    
    
        On the following morning, the gener·al requested permission to return the emperor's visit, by waiting on him in his palace.
        A pitched battle follow·ed.
        But the pride of Iztapalapan, on which its lord had freely l·avished his care and his revenues, was its celebrated gardens.
        ...
    

is in his twitter account, it's in no way plausible that they're just
innocuous tweets, and he can be compelled to reveal the secret.

A true steganographic message would have looked indistiguishable from any
other tweet that he would have made normally. this is a cute system, but it's
not steganography.

~~~
dpapathanasiou
The output is a function of the corpus.

So if Miranda doesn't usually tweet about the history of Mexico, he can pick
other texts (written by him or others) which would sound more plausible as
something he might normally tweet.

Having said that, the middle dot is not as unobtrusive as I would like, so
perhaps it's better to rethink that part of the system, using some of the
other suggestions in this thread.

~~~
brey
even ignoring the middle dot, just picking a more suitable corpus for that
person isn't necessarily going to make this technique look innocuous - it's
still a collection of excerpts taken from what looks like random positions
within a document. it looks suspicious.

you could imagine a system which uses entirely normal and habitual tweets, but
encodes information in choices of synonyms used in the text, or whether or not
punctuation was used in certain places, or the timing of the tweet's
publication. lower bitrates, but plausibly deniable as to the message's
existence.

~~~
dpapathanasiou
I would posit that a stream of non-sequitir tweets is not necessarily
suspicious (depending on the person/account in question, of course).

The classic steganography methods you're describing may appear less _obvious_
(for lack of a better term), but they're also easier to break once the pattern
is discovered.

~~~
brey
just as hard to break if you're doing it properly: one bit per tweet, encoded
as 'message ends with period = 1, no period = 0', and that's your ciphertext
stream. from then, AES or RSA or a OTP or whatever you want.

Then what you're writing will be functionally indistinguishable from random.

hmm ... but maybe too random, humans are bad at being truly random - the
entropy in your period usage will be too high ... perhaps xor the RSA output
with a OTP of something 'random' you scribbled yourself on a page ;-)

~~~
dllthomas
_" perhaps xor the RSA output with a OTP of something 'random' you scribbled
yourself on a page ;-)"_

... seemingly random bits, xored with anything not directly related to the
bits in question produces seemingly random bits... There are other ways of
transforming a sequence of bits to look less uniform, though.

------
dpapathanasiou
This is a side project I've been working on for the Lisp in Summer Projects[1]
contest.

It's a text steganography app using a simple book cipher, written in Clojure.

I welcome any feedback from HN so let me know what you think!

[1] [http://lispinsummerprojects.org/](http://lispinsummerprojects.org/)

~~~
zeckalpha
I could see an advertiser making interesting use of this. "Drink more
ovaltine!"

Have you thought about trying to do a public-key based version? People could
list their key in their profile.

~~~
coherentpony
The problem there is the 140 char limit.

~~~
a3n
You can split a sentence across multiple tweets. V1.1?

~~~
rhizome
Twitlonger isn't annoying enough?

------
ctz
The amusing thing about sending ciphertexts over twitter compared to english
text is that you can actually fit in more information, assuming you do the
encryption and ciphertext encoding right. That's because twitter transports
140 unicode code points.

(This has nothing to do with steganography, but seems relevant nonetheless. )

------
danieldk
Cool! One other fun approach may be to use syntactic transformation
(topicalization, middle field ordering, etc.) or lexical variation (e.g.
through synonyms):

[https://www.cerias.purdue.edu/assets/pdf/bibtex_archive/PSI0...](https://www.cerias.purdue.edu/assets/pdf/bibtex_archive/PSI000441.pdf)

The advantage of such an approach is that you can use coherent text/messages.

~~~
dpapathanasiou
Thanks, I'll look into those suggestions, especially since I'm not really
happy about using the middle dot (it's really _not_ unobtrusive, and the
entire system would work better if I didn't use it).

------
rw
I wrote a textual steganography library and CLI in 2011, called Plainsight:
[https://github.com/rw/plainsight](https://github.com/rw/plainsight)

Additionally, @workmajj and I wrote TweetFS using Plainsight. It lets you
recursively pack up directories and post them as an encoded linked list of
Tweets to Twitter:
[https://github.com/rw/tweetfs](https://github.com/rw/tweetfs)

I presented Plainsight at Hack'n'Tell NYC in 2011 and a video was recorded:
[http://bit.ly/pecGgW](http://bit.ly/pecGgW)

Plainsight uses each byte of the input message to generate tokens. Bits are
used to decide how to traverse the token tree, weighted by frequency. The
drawbacks are 1) verbosity and 2) incorrect grammar.

One of the lessons of writing Plainsight is that _spam can be used to contain
secret messages_. Send enough gibberish to enough people, with your intended
recipient included, and you'll look like a spammer--not a spy.

I also wrote a fuzzing tool, called Shag, to find edge cases, e.g. for single-
byte inputs:
[https://github.com/rw/shag/blob/master/shag.rb](https://github.com/rw/shag/blob/master/shag.rb)

\-- Example 1 (regular text)

Type your message to encode:

    
    
       echo 'Meet at Union Square at noon. The password is FuriousGreen.' > cleartext
    

Then, pipe it through Plainsight:

    
    
       cat cleartext | plainsight -m encipher -f sherlock.txt > ciphertext
    

The output will be Doyle-esque gibberish:

    
    
       cat ciphertext | fold -s
    
       which was the case, of a light. And, his hand. "BALLARAT." only applicant?" 
       decline be walking we do, the point of the little man in a strange, her 
       husband's hand, going said road, path but you do know what I have heard of you, 
       I found myself to get away from home and for the ventilator little cold night, 
       and I he had left my friend Sherlock of our visitor and he had an idea was not 
       to abuse step I of you, I knew what I was then the first signs it is the 
       daughter, at least a fellow-countryman. had come. as I have already explained, 
       the garden. what you can see a of importance. your hair. a picture upon of the 
       money which had brought a you have a little good deal in way: out to my wife 
       and hurry." made your hair. a charge me a series events, and excuse no sign his 
       note-book has come away and in my old Sherlock was already down to do with the 
       twisted
    

Now, decipher that ciphertext:

    
    
       cat ciphertext | plainsight -m decipher -f sherlock.txt > deciphered
       cat deciphered
       Meet at Union Square at noon. The password is FuriousGreen.
    
    

\-- Example 2 (binary data)

    
    
       $ dd if=/dev/urandom of=/dev/stdout bs=1 count=10 | plainsight -m encipher -f 1984.txt
       10+0 records in
       10+0 records out
       10 bytes (10 B) copied, 9e-05 s, 111 kB/s
       Adding models:
       Model: 1984.txt added in 0.89s (context == 2)
       input is "<stdin>", output is "<stdout>"
       
       enciphering: 100%|#####################################################################################################################################################################|474.67  B/s | Time: 0:00:00
       
       which is a war is real, the proles used mind on the telescreen. He could see through all right to. You have read what said. 'Yes,' only in the Ministry

~~~
dpapathanasiou
Thanks for hijacking my thread! ;)

Seriously, though, how would your library work for twitter?

It seems that the encoding process creates texts much, much larger than the
original message.

~~~
rw
TweetFS uses the SeqTweet library, which takes care of sequencing the tweets
for you. Specifically, see the _list_to_twitter method:
[https://github.com/workmajj/seqtweet/blob/master/seqtweet/se...](https://github.com/workmajj/seqtweet/blob/master/seqtweet/seqtweet.py#L36)

One of the use cases of TweetFS is to use Twitter as a 'dead drop'. You'll
generate a lot of tweets by doing that, but there's no harm done.

I don't think I hijacked your thread. The other comments also discuss textual
steganography.

~~~
dpapathanasiou
Thanks I'll look more into how TweetFS works.

I was just kidding about the hijiacking comment; don't people know how to
interpret emoticons anymore? :D

~~~
epaga
I think the problem is that unlike ":)", too often people use passive-
aggressive ";)"s, not meaning them in the friendly way they were originally
intended but with a slightly bitter aftertaste.

But this is getting a bit off-topic I suppose... :)

------
drakaal
The big issue I see is that Twitter detects and delete gibberish as spam. So
at best case your posts randomly get filtered when you use this.

At worst case after posting a bunch of gibberish Twitter bans your account.

~~~
geocar
Encode your secret message as a bunch of numbers, xor them your OTP, and then
look up users with those numbers and simply re-tweet their most recent
message.

Now your message is encoded in the userids of who you retweet.

~~~
wintersFright
Aren't go just publishing the otp and not the message? Or did I miss
something?

~~~
ctb_mg
They're publishing the encrypted ciphertext, which is the plaintext message
xor OTP (aka the key). The encrypted ciphertext would be the user id's of all
recent retweets.

The receiving party would have to know the OTP to decrypt the message.

------
hhm
Very nice! I worked on a similar steganographic system (not for tweets though)
that you can find here:
[https://github.com/hmoraldo/markovTextStego](https://github.com/hmoraldo/markovTextStego)
There you'll find both the source code and a link to a paper explaining how it
works and how it differs from other approaches.

------
timr
At Twitter Peak Hype, when journalists were writing silly things like:
"Twitter is nothing less than a _new internet protocol!_ ", I had a perverse
fantasy of implementing TCP/T(weet).

If someone were to do this, it would effectively subsume all further "I
implemented $X on Twitter" posts. Sort of like showing that a language is
Turing complete.

~~~
dpapathanasiou
Isn't that what app.net is supposed to be?

------
gpsarakis
Nice project. Considering a stream of tweets how can you find the beginning
and the end of a sentence/message?

~~~
dpapathanasiou
Ευχαριστώ!

There's no obvious way of telling where a "secret" sequence would begin and
end.

For now, it might be best left as a coordination issue, similar to the choice
of corpus: e.g., you know that you'll be tweeting secretly n times a day, at
these specific times only, etc.

------
netman21
Or you could use [https://scrambls.com/](https://scrambls.com/) which uses
strong crypto and works for facebook or any site. Keeping in mind that any
short message protocol is vulnerable to cryptanalysis.

------
cbr
For this to be secure (one-time-pad) you can't reuse the corpus. That's a big
enough pain that I doubt people will actually do it. Which means you can start
decoding their tweets once you collect enough.

~~~
dpapathanasiou
True, you _shouldn 't_ use the same corpus more than once, so there's always
going to be a coordination issue between the sender and recipients.

------
alexharris66
Cool. Much better than my secret twitter message project:
[http://www.twhatever.com/tweets](http://www.twhatever.com/tweets) :)

------
bzalasky
So, horse_ebooks could have an ulterior motive?

