
SpamMimic: encode your message into something innocent-looking - pmoriarty
http://www.spammimic.com/explain.shtml
======
Smerity
For my favourite variation of this, see "Practical Linguistic Steganography
using Contextual Synonym Substitution and a Novel Vertex Coding Method"[1].

Chang et al. use synonym substitution to encode hidden data into standard
text, resulting in perfectly readable and sensible output afterwards. This is
far less suspicious than someone keeping spam. Appendix B gives an example of
how innocuous the output can be. The best part is that even when the attacker
knows the system being used, it's still secure against an enemy (Kerckhoffs's
principle).

[1]:
[http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00176](http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00176)

~~~
sytelus
The approach in this paper looks much cooler. A major thing missing in OP's
website is that there is no way to add cover text (equivalent to "key" for
encryption). However I like the fact that message comes out as spam so it
(hopefully) stays out of inbox of receiving person and one would need to know
what to look for in spam folder. This is very cool because I'd never thought
of utility of making a message purposely look like spam.

------
pmoriarty
I encourage anyone interested in this to read about Peter Wayner's _Mimic
Functions_.[1]

From the abstract:

    
    
      A mimic function changes a file A so it assumes the statistical
      properties of another file B. That is, if p(t,A) is the
      probability of some substring t occuring in A, then a mimic
      function f, recodes A so that p(t,f(A)) approximates p(t,B) for
      all strings t of length less than some n. This paper describes
      the algorithm for computing mimic functions and compares the
      algorithm with its functional inverse, Huffman coding. It also
      provides a description of more robust mimic functions which can
      be defined using context-free grammars.
    

Using mimic functions, one could mimic spam or any other text (or non-text,
for that matter) corpus.

The two main challenges are deciding which statistical properties one wants to
mimic (for an adversarial steganalyst's mind is not always readily available
for perusal) and then actually mimicking them. In other words, it's easier
said than done.

[1] -
[http://www.nic.funet.fi/pub/crypt/old/mimic/mimic.text](http://www.nic.funet.fi/pub/crypt/old/mimic/mimic.text)

------
anon4

      Dear Friend ; Thank-you for your interest in our publication 
      . If you no longer wish to receive our publications 
      simply reply with a Subject: of "REMOVE" and you will 
      immediately be removed from our club ! This mail is 
      being sent in compliance with Senate bill 1626 ; Title 
      3 , Section 308 . THIS IS NOT MULTI-LEVEL MARKETING 
      . Why work for somebody else when you can become rich 
      as few as 10 WEEKS ! Have you ever noticed more people 
      than ever are surfing the web plus nearly every commercial 
      on television has a .com on in it ! Well, now is your 
      chance to capitalize on this . We will help you process 
      your orders within seconds and deliver goods right 
      to the customer's doorstep ! You are guaranteed to 
      succeed because we take all the risk . But don't believe 
      us ! Prof Simpson who resides in Illinois tried us 
      and says "Now I'm rich, Rich, RICH" . This offer is 
      100% legal ! We BESEECH you - act now . Sign up a friend 
      and you'll get a discount of 20% . God Bless ! Dear 
      Friend , Especially for you - this amazing news ! We 
      will comply with all removal requests . This mail is 
      being sent in compliance with Senate bill 1618 ; Title 
      2 , Section 301 . This is not multi-level marketing 
      ! Why work for somebody else when you can become rich 
      in 58 weeks ! Have you ever noticed people will do 
      almost anything to avoid mailing their bills plus most 
      everyone has a cellphone ! Well, now is your chance 
      to capitalize on this ! We will help you SELL MORE 
      and increase customer response by 170% ! You are guaranteed 
      to succeed because we take all the risk . But don't 
      believe us . Mr Jones of Georgia tried us and says 
      "Now I'm rich many more things are possible" ! This 
      offer is 100% legal ! So make yourself rich now by 
      ordering immediately ! Sign up a friend and you'll 
      get a discount of 60% . Best regards !

~~~
wofo
Dear Friend ; Especially for you - this red-hot announcement . This is a one
time mailing there is no need to request removal if you won't want any more .
This mail is being sent in compliance with Senate bill 2216 , Title 9 ;
Section 303 ! THIS IS NOT A GET RICH SCHEME . Why work for somebody else when
you can become rich within 41 days ! Have you ever noticed more people than
ever are surfing the web & how many people you know are on the Internet !
Well, now is your chance to capitalize on this . We will help you SELL MORE
and sell more ! The best thing about our system is that it is absolutely risk
free for you ! But don't believe us ! Mr Anderson of South Carolina tried us
and says "I was skeptical but it worked for me" ! We are a BBB member in good
standing ! We IMPLORE you - act now ! Sign up a friend and you get half off .
Thanks ! Dear Cybercitizen ; Your email address has been submitted to us
indicating your interest in our letter . If you no longer wish to receive our
publications simply reply with a Subject: of "REMOVE" and you will immediately
be removed from our mailing list ! This mail is being sent in compliance with
Senate bill 1621 ; Title 4 , Section 302 ! This is not a get rich scheme . Why
work for somebody else when you can become rich in 41 DAYS . Have you ever
noticed people love convenience plus most everyone has a cellphone . Well, now
is your chance to capitalize on this . We will help you deliver goods right to
the customer's doorstep & turn your business into an E-BUSINESS . The best
thing about our system is that it is absolutely risk free for you ! But don't
believe us . Ms Anderson of Hawaii tried us and says "I was skeptical but it
worked for me" ! We are licensed to operate in all states ! We BESEECH you -
act now ! Sign up a friend and you get half off ! God Bless .

~~~
hartator
Dear Friend , Especially for you - this amazing announcement . This is a one
time mailing there is no need to request removal if you won't want any more .
This mail is being sent in compliance with Senate bill 2116 ; Title 3 ;
Section 304 ! This is a ligitimate business proposal ! Why work for somebody
else when you can become rich within 31 weeks . Have you ever noticed nearly
every commercial on television has a .com on in it plus more people than ever
are surfing the web . Well, now is your chance to capitalize on this . We will
help you increase customer response by 130% plus process your orders within
seconds . You can begin at absolutely no cost to you . But don't believe us !
Mr Jones of Alaska tried us and says "I was skeptical but it worked for me" .
We are licensed to operate in all states ! If not for you then for your LOVED
ONES - act now ! Sign up a friend and your friend will be rich too . Warmest
regards ! Dear Cybercitizen , You made the right decision when you signed up
for our mailing list ! This is a one time mailing there is no need to request
removal if you won't want any more ! This mail is being sent in compliance
with Senate bill 2416 , Title 4 , Section 302 . THIS IS NOT A GET RICH SCHEME
. Why work for somebody else when you can become rich in 99 days . Have you
ever noticed more people than ever are surfing the web & how long the line-ups
are at bank machines . Well, now is your chance to capitalize on this ! WE
will help YOU increase customer response by 150% and sell more . The best
thing about our system is that it is absolutely risk free for you ! But don't
believe us ! Ms Simpson who resides in Indiana tried us and says "My only
problem now is where to park all my cars" . We are licensed to operate in all
states ! Do not delay - order today ! Sign up a friend and you'll get a
discount of 50% . Thanks ! Dear Friend ; Your email address has been submitted
to us indicating your interest in our newsletter . If you no longer wish to
receive our publications simply reply with a Subject: of "REMOVE" and you will
immediately be removed from our mailing list ! This mail is being sent in
compliance with Senate bill 2416 ; Title 3 ; Section 302 ! This is not a get
rich scheme ! Why work for somebody else when you can become rich as few as 58
DAYS ! Have you ever noticed more people than ever are surfing the web & how
many people you know are on the Internet ! Well, now is your chance to
capitalize on this ! WE will help YOU deliver goods right to the customer's
doorstep and decrease perceived waiting time by 140% . You can begin at
absolutely no cost to you ! But don't believe us . Ms Anderson of Georgia
tried us and says "Now I'm rich many more things are possible" ! We are
licensed to operate in all states ! We IMPLORE you - act now ! Sign up a
friend and your friend will be rich too . Thanks .

------
yoha
This is an interesting concept but this implementation seems rather
inefficient. It should be possible to exploit spacing, punctuation and case
more effectively.

Related web comic with a similar idea: [http://cube-
drone.com/2013_06_05-Cube_Drone_37_The_Often_Ins...](http://cube-
drone.com/2013_06_05-Cube_Drone_37_The_Often_Inscrutable_Motivation_Of_Programmers.html)

~~~
mobiuscog
That comic describes approximately 95% of HN.

------
tiler
SpamMimic works by using a context free probabilistic grammar to derive its
output. Each production of the grammar is translated into a Huffman tree based
on the probabilities assigned to each variable or terminal symbol in the
production.

For example:

    
    
      S -> A(.25) | B(.75)
      A -> aS(1.0)
      B -> bS(.75) | b(.25)
    

You simply feed the mimic function an encoded message (as a binary string)
until you consume all the bits. Of course you can also pad the bit string so
that it always terminates on a terminal symbol.

I wrote a program not too long ago that took some inspiration from SpamMimic
and linguistic steganography in general. For fun I used the comments from this
thread as input to my program:

    
    
      So why not send it as spam? The key here is hiding in this approximately 95% of HN.
    
      So why not send spammer--not a spy.
    
      For my favourite variation seems rather inefficient. It should be possible output 
      can be used already, and you just look for in spam thousands of people with it. 2. 
      Also send lots of receiving person and a Novel Vertex Coding and identify the 
      system being used, it's used to encoding and identify the fake spam. Appendix 
      B gives an enemy (Kerckhoffs's principle).
    
      [1]: https://github.com/rw/tweetfs
    
      Plainsight uses each byte of the
    

The encoded message is: 'meet at 3'

------
carmaa
This is pretty cool.

Closest thing I've seen to text steganography [1]. Probably more efficient (I
tried to encode a 1.000 character message, and the encoded message ended up
being 80.000 chars long - that's a long SPAM email) to encode your message in
a picture though, although I can see use cases for when text only may be
desireable.

[1]
[http://en.wikipedia.org/wiki/Steganography](http://en.wikipedia.org/wiki/Steganography)

------
Xoxox
My encode for the word "combinator"

Dear Professional , This letter was specially selected to be sent to you . If
you no longer wish to receive our publications simply reply with a Subject: of
"REMOVE" and you will immediately be removed from our mailing list . This mail
is being sent in compliance with Senate bill 2416 ; Title 4 , Section 302 .
THIS IS NOT A GET RICH SCHEME ! Why work for somebody else when you can become
rich as few as 69 days ! Have you ever noticed people love convenience and
more people than ever are surfing the web ! Well, now is your chance to
capitalize on this ! We will help you process your orders within seconds and
SELL MORE ! You can begin at absolutely no cost to you ! But don't believe us
! Prof Ames who resides in North Carolina tried us and says "Now I'm rich many
more things are possible" ! This offer is 100% legal ! We BESEECH you - act
now ! Sign up a friend and you'll get a discount of 50% . Thanks .

------
peterwaller
Neat idea. I know it's not a serious proposal, but the problem with this sort
of approach will be that the message will be identifiable from the fact that
it hasn't actually been sent as spam to many people. So an attacker can
identify a suspect message just by considering its distribution.

~~~
antihero
So why not send it as spam? The key here is hiding in the this can be done in
multiple ways.

1\. Encode fake spam and send spam thousands of people with it. 2\. Also send
lots of real spam to your intended targets.

Their spam filter could then attempt decoding and identify the fake spam from
the real spam, and you just look like a big nasty old spammer.

The thing I like about this approach is that for all we know, it's used
already, and some of those junk emails that we've got, you know, maybe even a
classic, could have actually been messages from some spy agency that contained
a message!

~~~
peterwaller
If only then it were a solution which scaled and didn't have harmful effects..

~~~
Karunamon
Depends on how you define "harm" i suppose. Considering it's not real spam,
there's not a real scam company on the other side of the message waiting to
grab your cash, the "damage" is a few hundred kilobytes of text ending up in
the spam can along with all the other legitimate spammers.

------
rw
Previously:
[https://news.ycombinator.com/item?id=6427525](https://news.ycombinator.com/item?id=6427525)

Pardon the essay, but I've written a tool in this space before.

Back in 2011, I wrote a textual steganography library and command-line
application, called Plainsight:
[https://github.com/rw/plainsight](https://github.com/rw/plainsight)

Additionally, @workmajj and I wrote TweetFS using Plainsight. It lets you
recursively pack up directories and post them as an encoded linked list of
Tweets to Twitter:
[https://github.com/rw/tweetfs](https://github.com/rw/tweetfs)

Plainsight uses each byte of the input message to generate tokens. Bits are
used to decide how to traverse the token tree, weighted by frequency. The
drawbacks are 1) verbosity and 2) incorrect grammar.

One of the lessons of writing Plainsight is that spam can be used to contain
secret messages. Send enough gibberish to enough people, with your intended
recipient included, and you'll look like a spammer--not a spy.

\-- Example 1 (regular text) Type your message to encode:

    
    
       echo 'Meet at Union Square at noon. The password is FuriousGreen.' > cleartext
    

Then, pipe it through Plainsight:

    
    
       cat cleartext | plainsight -m encipher -f sherlock.txt > ciphertext
    

The output will be Doyle-esque gibberish:

    
    
       cat ciphertext | fold -s
       which was the case, of a light. And, his hand. "BALLARAT." only applicant?" 
       decline be walking we do, the point of the little man in a strange, her 
       husband's hand, going said road, path but you do know what I have heard of you, 
       I found myself to get away from home and for the ventilator little cold night, 
       and I he had left my friend Sherlock of our visitor and he had an idea was not 
       to abuse step I of you, I knew what I was then the first signs it is the 
       daughter, at least a fellow-countryman. had come. as I have already explained, 
       the garden. what you can see a of importance. your hair. a picture upon of the 
       money which had brought a you have a little good deal in way: out to my wife 
       and hurry." made your hair. a charge me a series events, and excuse no sign his 
       note-book has come away and in my old Sherlock was already down to do with the 
       twisted
    

Now, decipher that ciphertext:

    
    
       cat ciphertext | plainsight -m decipher -f sherlock.txt > deciphered
       cat deciphered
       Meet at Union Square at noon. The password is FuriousGreen.
    

\-- Example 2 (binary data)

    
    
       $ dd if=/dev/urandom of=/dev/stdout bs=1 count=10 | plainsight -m encipher -f 1984.txt
       10+0 records in
       10+0 records out
       10 bytes (10 B) copied, 9e-05 s, 111 kB/s
       Adding models:
       Model: 1984.txt added in 0.89s (context == 2)
       input is "<stdin>", output is "<stdout>"
    
       enciphering: 100%|#####################################################################################################################################################################|474.67  B/s | Time: 0:00:00
       
       which is a war is real, the proles used mind on the telescreen. He could see through all right to. You have read what said. 'Yes,' only in the Ministry
    

\----

One serious use case is to seed the generator with a spam email corpus. This
lets you generate messages that look like spam. Example:

    
    
       wget https://spamassassin.apache.org/publiccorpus/20030228_spam.tar.bz2
       tar -jxvf 20030228_spam.tar.bz2
       cat spam/0* > spam-corpus.txt
    
       echo "The Magic Words are Squeamish Ossifrage" | plainsight -m encipher -f spam-corpus.txt > spam_ciphertext
       
       $ cat spam_ciphertext
       (8.11.6/8.11.6) 3 (Normal) Internet can send e-mails until to transfer 26 10 [127.0.0.1] also include address from the most logical, mail business for your Car have a many our portals ESMTP Thu, 29 1.0 this letter on internet, <a style=3D"color: 0px; text/plain; cellspacing=3D"0" how quoted-printable about receiving you would like width=3D"15%" width=3D"15%" border="0" width="511" Date: Tue, 27 Thu, 19 26 because zzzz@localhost.spamassassin.taint.org for
       
       $ cat spam_ciphertext | plainsight -m decipher -f spam-corpus.txt
       Adding models:
       Model: spam-corpus.txt added in 2.57s (context == 2)
       input is "<stdin>", output is "<stdout>"
       
       deciphering: 100%|#####################################################################################################################################################################|543.84  B/s | Time: 0:00:00
       
       The Magic Words are Squeamish Ossifrage

~~~
leephillips
This (awesome work) makes me wonder if people are using systems like this in
the wild. Because I've gotten plenty of spam email, and stumbled across
websites (thanks, Google!) that read just like this.

~~~
skygazer
Hmm. That's intriguing. Spam could be modern day Numbers Stations, broadcast
to our inboxes.

------
Nursie
I kinda-sorta wrote one of these about 15 years ago. In VB 6!

It simply took your data stream and encoded the message in the first letters
of each word in some generated gibberish. You transform the 8-bit arbitrary
byte stream into a 26-bit ascii representation to give you your list of first
letters.

The gibberish was generated by choosing randomly from a list of common
structures. That last sentence would have been encoded as
[a,N,V,G,P,V,Av,P,a,N,P,Aj,N] - article, noun, verb ... Each word category
(articles are skipped) had a dictionary containing one or more of each type of
word starting with each letter.

Wasn't quite as convincing as the fake spam! I was rather pleased with it
though, and it was far more interesting than the work I was supposed to be
doing, as is writing this post. Back to work....

~~~
skygazer
You just reminded me -- I once wrote a script that decoded and then eval'd a
hidden command encoded within the whitespace of the script file itself. My
goal was to create an entirely benign looking script that would hold up to
visual scrutiny, but still be possibly malicious - in the final variant, it
downloaded an additional remote script. Not that I ever used it for anything,
but it did temper my natural trust in cursory inspection of benign-seeming
open source code.

------
nemasu
This is pretty cool, reminds me a bit of something I made a while back:
[https://github.com/nemasu/utf8encode](https://github.com/nemasu/utf8encode)

It encodes/decodes data into valid UTF-8 characters, please ignore the
terrible interface & coding (was more proof of concept) -_-

Can use it to post more information using Twitter (albeit, completely human
unreadable).

------
john2x
Heh, the mailto link when encoding defaults to billg@microsoft.

------
stonewhite
It is rather interesting that this site never mentions the word Steganography.

But also it is an intelligent way of implementing it, masking it as spam.

~~~
rjaco31
They actually do, in the Credits page.

------
hayksaakian
Can this be layered on top of a PGP signed and encrypted email message?

I wouldn't really that stenography by itself to secure anything.

Sure its a good diversion.

------
fit2rule
Add a pinch of NLP and you can not only set your enemies on a course to hell,
but re-invent religion while you're at it.

------
jmnicolas
Interesting but if the recipient's ISP classify it as spam, the mail may never
reach its mailbox.

------
wauter
So, is there an explanation somewhere of what they do?

