
How Not to Encrypt a File – Courtesy of Microsoft - rakel_rakel
https://medium.com/@bob_parks1/how-not-to-encrypt-a-file-courtesy-of-microsoft-bfadf2b0273d
======
GreaterFool
The author could spend less time bashing the original article and a little bit
more explaining how to do things right.

This:

> Suggestion to use the encryption key as the IV

is a second sub-heading while the words "initialization vector" don't appear
until much later. Initialization vector is pretty obvious, "IV" isn't.

Also the author spends time complaining that the original article
misunderstands the use of initialization vector while providing no explanation
of how it should be used.

After reading the post I haven't learned anything useful other than that the
original article was bad.

~~~
jlebar
Explaining to you how to do it the right way is not an obligation of anyone
that says X article is wrong?

"This article on global warming could spend less time bashing governments for
inaction and more time talking about how I can reduce my emissions."

"This bad restaurant review could spend less time bashing the chef's food and
more time telling me where the good restaurants are."

Similarly maybe the author didn't explain what "IV" means because their
audience understands that term.

"This article in CACM uses 'NVRAM' in the heading, while the words "non-
volatile" don't appear until much later. Non-volatile is pretty obvious,
'NVRAM' isn't."

~~~
recursive
I was in the audience, and if I ever knew how one should use an IV, I forgot.
The article would have been more valuable to me if it gave a summary of what
IVs _are_ instead of what they _aren 't_.

~~~
jlebar
What makes you think were the author's intended audience, exactly? It sounds
like the author intended their article for people who know what an IV is. You
don't, and, while there's no shame in that, it does seem to indicate that
you're not in the audience.

This is like me complaining that I can't understand Terry Tao's blog when it's
posted to HN. It's not written for me.

------
Sophira
While I'm sure the article is correct, it doesn't even attempt to link to
resources to say _how_ these things are misunderstandings. For example, I
myself don't really understand IVs, and from my perspective I'm left with no
clearer of anuidea about _why_ IVs shouldn't be considered secret, or why the
IV isn't required to be able to decrypt the file again.

Regardless, it's obvious that the fact that bad encryption advice in a MSDN
article is horrifying.

~~~
fpgaminer
> I myself don't really understand IVs

Time to drop some knowledge!

IVs are used in a number of places in cryptography, so I'll just pick one
(easy) example.

Consider the stream cipher ChaCha20. You can think of ChaCha20 as a black box.
You input a key and an IV and out you get a really, really long stream of
uniformly random bytes. (This is a simplification but sufficient here).
ChaCha20 works in such a way that having any or all of the output stream
doesn't help you figure out what the inputs were. It's irreversible. ChaCha20
is also deterministic; the same input will give the same output.

You can then use the output of random bytes to encrypt a message by XORing
with your plaintext. To later decrypt, you feed the same key and IV, get the
same stream, XOR the ciphertext with it, and by the property of XOR you'll get
the plaintext.

Now why is there an IV? Let's consider a ChaCha without an IV. The system
works like so:

    
    
        R = ChaCha(Key)
        Ciphertext = Message ^ R
    

So let's encrypt two different messages:

    
    
        R = ChaCha(Key)
        Ciphertext1 = Message1 ^ R
        R = ChaCha(Key)
        Ciphertext2 = Message2 ^ R
    

Notice how R is the same for both messages? Again, ChaCha is deterministic;
the output is the same for the same inputs. Since the key is the same, R is
the same. Now an attacker, knowing this, can do this

    
    
        Q = Ciphertext1 ^ Ciphertext2
    

What does Q end up being? Let's look:

    
    
        Ciphertext1 ^ Ciphertext2
        = Message1 ^ R ^ Message2 ^ R
        = Message1 ^ Message2
    

So Q ends up being equal to the XOR of the two messages. That's really bad.
The xor of two messages might be enough to tell the attacker what the messages
are, especially if the messages are predictable (like english text). But maybe
that's not scary enough. Well there's another attack. What if you're
encrypting a data format with a header. Headers often have the same data in
the same places. So the attacker knows part of the message. Uh oh...

    
    
        R = Ciphertext1 ^ Message1
    

If the attacker knows the message (or any parts of it) they can recover the R
of those parts. And now, since your key is always the same and your R is
always the same, all the other messages you encrypt will have those bytes
exposed.

This is where IVs come in:

    
    
        R = ChaCha(Key, IV)
    

IV should be unique per message. That means that every R is different! None of
the above attackers work anymore. XORing two ciphertexts together returns
gibberish:

    
    
        R1 = ChaCha(Key, IV1)
        Ciphertext1 = Message1 ^ R1
        R2 = ChaCha(Key, IV2)
        Ciphertext2 = Message2 ^ R2
    
        Ciphertext1 ^ Ciphertext2
        = Message1 ^ R1 ^ Message2 ^ R2
    

And if the attacker knows the message, all they can recover is R1 or R2 (or
any R). But that's useless, because since all your IVs are unique that R will
never be seen ever again.

That's the point of IVs.

> why the IV isn't required to be able to decrypt the file again.

It is required. Obviously you need all the inputs to ChaCha to get the byte
stream again, to decrypt the message.

Now sometimes the IV is known from the protocol. So say you're using ChaCha to
encrypt network traffic. You might set the IV equal to the packet number. So
both sides already know the packet number.

But you always need the IV to decrypt.

> and from my perspective I'm left with no clearer of anuidea about why IVs
> shouldn't be considered secret,

Consider again ChaCha20 as a blackbox. Key+IV goes in, stream of bytes comes
out. There's no way to reverse that without the key (and IV). Since the
attacker doesn't know the Key, they can't reverse it. Knowing the IV doesn't
help.

Another way to think about it is that, instead of accepting a 256-bit key and
a 64-bit IV, it's really just a 320-bit key. Knowing 64-bits of a 320-bit key
doesn't help break a cipher. The cipher is still 256-bits strong. So you can
share the IV without affecting security.

BIG NOTE: It's important that an IV is always unique. If an IV is ever re-
used, the above attacks become available again because R will be the same for
two messages.

Hope that helps. This is only one way that IVs are used. In ChaCha20 it's
called a nonce, because ChaCha20 is geared towards usage on network protocols
where the above trick of using packet number is applied. For block ciphers
there are various cipher modes that get used, and most of them need an IV. The
purpose is always the same; to make this "session" of encryption unique.

There's another way to use IVs, and I think they re-affirm the concept of what
an IV actually is. Let's say you have a cipher that only accepts a key! No IV
(like AES). You still want to make your encryption sessions unique. A way to
do that is this:

    
    
        TempKey = HMAC (IV, Key)
    

And then use TempKey. HMAC is a form of hash. In this case it lets us combine
a Key and IV in an irreversible way, yielding a new key. TempKey will be the
right size key for the cipher (say, 256-bits). What this is doing is giving us
a unique key for every encryption session. And that's the heart of IVs. And in
many ways, ChaCha20 is doing exactly that. It's hashing together Key and IV
and using the output hash to generate a long stream of random data that can't
be reversed back to the key+IV.

(and in case you're wonder, yes, you can use a cryptographically secure hash
function alone to build a stream cipher like ChaCha. It'll just be _really_
_really_ slow, because hash functions are really, really slow compared to
ChaCha.)

~~~
teh_klev
Thanks for spending the time explaining this.

~~~
fpgaminer
Happy to.

Is this something people find interesting? I was thinking of doing a small
guide/tutorial/course where I teach these basics of cryptography, while
building up to a working file encryption tool written completely from scratch.
Probably such a thing exists already, but /shrugs these kinds of questions
always seem to come up.

~~~
UncleMeat
This is risky. Unless you are a pro, it is generally not a great idea to
publish a "how-to" for crypto because of the risk you might get it wrong in
subtle ways that now propagate through the ecosystem.

~~~
fpgaminer
I know all the subtle things.

That said, I wouldn't write the course for people intending to become
cryptographers or cryptographic engineers. That would require a university
grade program. It would be geared towards people who have a curiosity of the
inner machinations of encryption. Ya know, like the people who come to Hacker
News, read articles like this, and ask questions.

------
pacaro
Note: All my information re: Microsoft is from no later than 2013.

This is indicative of a classic challenge in the industry.

To ship code that uses crypto at Microsoft you have to go through an auditing
process. To ship code that uses novel crypto, or works directly with crypto
primitives, you have to be reviewed by a specialist crypto review board — that
contains security and crypto people from across the company, names that you
might know (e.g. Niels Ferguson was there last time I needed a review. Hi
Niels!)

Samples and documentation aren't held to the same standard.

------
nailer
Microsoft have already 404d the article: [https://support.microsoft.com/en-
us/help/307010](https://support.microsoft.com/en-us/help/307010)

~~~
casparz
Luckily we have a snapshot:
[https://web.archive.org/web/20170327154501/https://support.m...](https://web.archive.org/web/20170327154501/https://support.microsoft.com/en-
us/help/307010/how-to-encrypt-and-decrypt-a-file-by-using-visual-c)

~~~
bartread
Also now dead - I just get a blank page apart from the header and footer.

~~~
kalleboo
If you disable JavaScript you can see the content, it seems like there's some
script that replaces it on page load.

~~~
nthcolumn
wat? why? they have some sort of 'message will self-destruct javascript' in
their page which is carried into the wayback machine?

------
unscaled
As someone in charge of reviewing all crypto code for a sizable chunk of my
company, I've yet to see a single case of someone using encryption primitives
correctly by naive developers. To tell the truth, I don't think I've ever seen
a single example of IVs used correctly.

At the very best of times I get AES-CBC-HMAC-SHA1 (usually Encrypt-AND-MAC)
with binary keys and secret static IV.

I'm still waiting for the developer that will botch AES-GCM with a random
nonce so I can have first world problems, but we're not there yet.

I wanted to call Microsoft sneaky for pulling out this article, but
considering basically every top-ranked "how do I encrypt with AES" question on
StackOverflow is full of bad advice, I'm glad they at least did something.

------
jwilk
The article says that DES "can be brute forced in a single digit number of
days by a modern computer".

    
    
      2**56 keys / 9 days ≈ 92.7 Gkeys/s
    

Can modern computers actually compute DES _that_ fast?

~~~
danbruc
This benchmark [1] gives 196.2 GH/s for DES using 8 Nvidia GTX 1080 Ti and
Hashcat 3.5. So while your average computer is probably not quite sufficient
it is certainly in reach.

[1]
[https://gist.github.com/epixoip/ace60d09981be09544fdd3500505...](https://gist.github.com/epixoip/ace60d09981be09544fdd35005051505)

------
natch
Another version of essentially the same article is still live here:

[https://support.microsoft.com/en-us/help/301070/how-to-
encry...](https://support.microsoft.com/en-us/help/301070/how-to-encrypt-and-
decrypt-a-file-by-using-visual-basic-.net-or-visual)

------
d--b
Yep, all over the place:

[https://searchcode.com/?q=ASCIIEncoding.ASCII.GetBytes%28sKe...](https://searchcode.com/?q=ASCIIEncoding.ASCII.GetBytes%28sKey%29%3B)

EDIT: ok maybe not "all over the place", but it's been done.

------
Strategizer
The article author is complaining about an MSDN article not being updated. The
content even says it applies to VS 2005 at its highest. That's a hint of how
old it is. Is he going to get the print version and complain about that next.
If programmers are using this without thought that is on them not the example
code.

------
cesarb
Raymond Chen wrote some time ago about the variable quality of MS Knowledge
Base articles:
[https://blogs.msdn.microsoft.com/oldnewthing/20060424-21/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20060424-21/?p=31433)

------
BusinessInsider
That's pretty disturbing. Though to be fair, the article in question was
written a while ago (since it targets .NET 2005), and to be less fair, MS
doesn't really review their documentation very well, at all.

------
duke360
probably you are too youn, in the past when internet wasn't so ubiquitus,
having a MSDN cd documentation was a live saver. the docs that today have
serius content directly descend from that days, the res, as other already
said, are just boilerplate autogenerated docs., which nobody maintains anymore
because simply the technology is too fast. so probably this doc page abaut
usage of DES is directly from 1990 or so... and in that days probably was good
enough

------
TheSpecialist
It does seem useless to make the IV the same as the key. But is there a reason
making the IV the same as the key is worse than using 0 as an IV?

Just asking.

------
norcimo5
To encrypt: tar cz foo | openssl aes-256-cbc -salt -out foo.enc

To decrypt: openssl aes-256-cbc -d -in foo.enc | tar xz

(foo can be a file or directory)

~~~
snakeanus
This does not contain a MAC though, does it? Also why CBC? Why not CTR/GCM
instead? And why AES256 instead of Chacha20-Poly1305 or some other modern
AEAD?

~~~
norcimo5
What are the advantages of GCM over CBC? And whats wrong with AES256?

~~~
snakeanus
\- GCM unlike CBC is an AEAD mode (has a MAC build-in)

\- CBC needs padding, which when misused can lead to padding oracle attacks

\- GCM allows for parallel encryption

> And whats wrong with AES256?

There are more modern, faster and better ciphers that are designed to not be
vulnerable against many side-channel attacks that AES is difficult to protect
against.

------
snakeanus
I feel disgusted after reading this. I wonder how many people applied the
advices given by the original article because they made the bad decision to
trust the official documentation by MS.

~~~
bartread
Oh, come on: whatever Microsoft's faults might be they have a _very_ long
track record, stretching back decades, of providing overall high quality
documentation for developers.

Yes, there are errors. Yes, sometimes there is deeply misguided advice. But,
on the whole, MSDN and its ilk has helped me far more often than it's hurt me.

Key point: compared with much other vendor and OSS documentation, Microsoft
are absolutely streets ahead.

~~~
setq
Most of the documentation is boilerplate. There's very little real content now
and most of it is filler.

~~~
wfunction
Here's an example of a new API they added recently (the first one I thought
of): [https://msdn.microsoft.com/en-
us/library/windows/desktop/mt5...](https://msdn.microsoft.com/en-
us/library/windows/desktop/mt595731.aspx)

It says things like this:

> Indicates that the data for the file should be obtained from a WIM file. On
> access, data is transparently extracted from the WIM file and provided to
> applications. If the file contents are modified, data is transparently
> decompressed and the file is restored to the same physical form it had if
> this API were not used.

This is "boilerplate" and "very little content" to you? What are you thinking
of?

~~~
setq
To be fair Win32 isn't terrible. Have you looked at the .Net docs?

[http://imgur.com/a/iK4uG](http://imgur.com/a/iK4uG)

~~~
krallja
You need to click on the overload of this method that you are interested in.
This is just the top level "where do you want to go?" document.

~~~
setq
It's not. There's a content about 11 screens down, not that you'd find it
easily.

~~~
darklajid
Again, GP just stated that these are local links, links to anchors on that
very same page, explaining the overloads.

Are there too many? For this class, maybe. It's at least potentially worth
discussing it. But the way the navigation works isn't "just scroll until
something seems to fit".

------
wintorez
I always look at Microsoft in order to learn how not to do anything /s

------
giancarlostoro
>It’s a good thing the caesar shift isn’t available in their library or it
would probably have ended up in this tutorial.

[https://docs.python.org/2/library/codecs.html#python-
specifi...](https://docs.python.org/2/library/codecs.html#python-specific-
encodings)

Python does rot13 :)

~~~
proaralyst
But that's in the codecs library, not a cryptography library.

