Can you break my home rolled encryption? 103 points by quelsolaar on Oct 26, 2013 | hide | past | web | favorite | 64 comments

 One of the necessary conditions for a secure cipher is security under chosen plaintext attacks, as specified by the following game:I give you two plaintexts of the same length of my choice, you select one of them and encrypt it with a key of your choice and give me the resulting ciphertext. If I can determine with probability p>50% which of the two plaintexts you have encrypted, the cipher is considered broken at level p.I chose the following two plaintexts relative to the C source liked to elsewhere in this thread, with`````` #define DATA_SIZE (1024 * 1024 * 128) `````` Plaintext 1:`````` for(i = 0; i < DATA_SIZE; i++) decrypted[i] = i; `````` Plaintext 2:`````` for(i = 0; i < DATA_SIZE; i++) decrypted[i] = 0; `````` and claim that your algorithm is broken at a level of at least 95%. (I actually think it is 100% if I make the plaintext long enough, but I'm hedging my bets by conceeding 5%).I don't need the full ciphertext, the output of the following code fragment is sufficient:`````` for(i = 0; i < DATA_SIZE; i++) { count[encrypted[i] & 0xFF] += 1; count[(encrypted[i] >> 8) & 0xFF] += 1; count[(encrypted[i] >> 16) & 0xFF] += 1; count[(encrypted[i] >> 24) & 0xFF] += 1; } for(i = 0; i < 256; i+=1) { printf(" %8d", count[i]); if (count[i] > 2400000) { printf("*"); } else { printf(" "); } if (i%16 == 15) { printf("\n"); } }``````
 I have not considered this kind of game, and therefor not designed the cypher to withstand it. I need to run your code and do some tests to look in to it further.BUT:I currently think that the line:key[pos_a] ^= key[pos_c] ^ i ^ output[i];Is a problem as "output" can counteract "i". I would probably replace it with:key[pos_a] ^= key[pos_c] ^ output[i]; key[pos_c] ^= i;I'm also thinking about ways that "i" could influence in a less regular way and if that is useful.
 > I have not considered this kind of game, and therefor not designed the cypher to withstand it.This seems to be the central summation of why most of us should not be rolling our own, and why peer review and cryptanalysis is so important to determine if something really is secure. :)
 In fact I think everybody who's interested in the subject should try to roll their own crypto for the experience and fun in it. They shouldn't use it in the real world though.
 Is there a club for this sort of activity?
 yes. they meet at Fort Meade.
 Most stream ciphers just output a pseudorandom stream to be xored with the plaintext. Treating it as a PRG simplifies the security analysis. Letting the plaintext affect the state is more likely to hurt than help if you assume that the attacker knows or can influence parts of the plaintext.
 This to me is the challenge. If done the wrong way you would allow an attacker to influence the algorithm, done the right way it could remove any patterns created by the key. It seams all attacks proposed in this thread try to attack this problem, while the rest seams harder to attack.
 `````` decrypted[i] = encrypted[i] ^ key[pos_a] ^ key[pos_b]; key[pos_c] = (key[pos_c] << 31) | (key[pos_c] >> 1); key[pos_a] ^= key[pos_c] ^ i ^ decrypted[i]; `````` Expanding decrypted:`````` key[pos_a] ^= key[pos_c] ^ i ^ encrypted[i] ^ key[pos_a] ^ key[pos_b]; `````` You just xored key[pos_a] into itself so we can eliminate it:`````` key[pos_a] = key[pos_c] ^ i ^ encrypted[i] ^ key[pos_b]; `````` I'm pretty sure you just threw away one word's worth of entropy.
 except pos_c might be equal to pos_a or pos_b, and key[pos_c] just got rotated. It's not quite equivalent, I tried it.
 During the encryption phase the argument by Strilanc doesn't hold (as in that case decrypted[i] is given as the input and does not contain any key admixture), and it is exactly the case where pos_c equals pos_a that is problematic, as in that case the statement`````` key[pos_a] ^= key[pos_c] ^ i ^ decrypted[i]; `````` overwites key[pos_a] with the value`````` i ^ plaintext[i] `````` that doesn't depend on any part of the key at all.
 You're right, if pos_c = pos_a, the rotation does prevent the cancellation.But throwing away a word of entropy most of the time is still very serious, and ending up with a different cancellation in the remaining cases is not really promising.
 To give you some pointers to the literature, you are trying to create a stream cipher here. (See https://simple.wikipedia.org/wiki/Stream_cipher)As always, it's terrible easy to create an encryption algorithm that's complicated enough that you yourself can't break it. That doesn't make it safe. (It might still be a fun exercise for you.)Have you tried breaking any ciphers, yet? Even obviously weak ones require some thought to break.
 My hope is that this one would be fun for others to break, rather then me having grand illusions of knowing better then everyone else.I know a fair bit about security from an engineering standpoint, but wouldn't call myself a master. My mathematical background is much weaker.Thanks for the links, I have already read it.
 Personally I think you would be more likely to get a better response:* If you delivered a complete implementation in a popular language like C, Python, or Ruby, rather than publishing pseudocode that may or may not be interpreted correctly by error-prone humans* If there was a clear goal. "Find a weakness" is vague; "Recover the missing plaintext from this 1MB message" is specific.
 Here is a C implementation I just banged together. http://www.quelsolaar.com/crypto_test.c (yet another possible source of embarrassment)Ill try to put together a "test" for the algorithm.
 Your key is based on an insecure RNG. It's trivial to reconstruct the seed and thereby recover the key.Your key isn't random. It's the same every time.This example also isn't communicating any secrets, so it's not really cryptography.
 I think that the author is more interested in the design of the encryption algorithm rather than the specifics of the reference implementation provided. So yeah a dodgy prng was used but is there anything wrong otherwise - i.e. something inside the algorithm itself that reduces its efficacy and makes it vulnerable.
 Well, if they would provide an actual implementation rather than a reference implementation, then I would break it. Breaking cryptography often depends on breaking implementations, not breaking the underlying theory. That's why AES-CTR is theoretically secure but so deviously hard to actually implement.
 I note in the comments that the random number generator is not secure in any way. Its just a handy thing for testing, not something you would use in real life.
 Ok, well, if you ever actually implement your theory, then I would love to break it for you. Shoot me an email (sillysaurus2 at gmail) if you ever do.
 I like the idea of mutating both the output and the key. You are using it for a single pass over the length of the data. You could use the final key as input to another round of crypto. Repeating it for a number of rounds would keep it deterministic, but increase the computational load of an attacker and create an even distribution of input to output. Ah, problem could be that decryption would need the final key, not the one you started with, if so it would be good for a 1-time hash. So the key idea for the crypto is the nuking of the part of the key that was just used for the xor. Thing that concerns me is if the attacker has access to your crypto in binary form, he could run it again and again on different inputs (data,key) to infer its structure - you'd want to slow him down.
 It feels like the key leaks a lot into the encoded sequence. Just try encoding a sequence of 0x00 0x00 0x00 0x00...`````` #include int main() { char key[] = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08}; int key_size = sizeof(key) / sizeof(key[0]); int pos_a = key[0]; int pos_b = key[1]; int pos_c = key[2]; char encrypted[100], decrypted[100]; for(int i = 0; i< 100; i++) encrypted[i] = 0; for(int i = 0; i < 100; i++) { int old_a = pos_a; pos_a = key[pos_b] % key_size; pos_b = (pos_a + 1 + key[pos_c] % (key_size - 1)) % key_size; pos_c = (pos_a + 1 + key[old_a] % (key_size - 1)) % key_size; decrypted[i] = encrypted[i] ^ key[pos_a] ^ key[pos_b]; key[pos_c] = (key[pos_c] << 31) | (key[pos_c] >> 1); key[pos_a] ^= key[pos_c] ^ i ^ decrypted[i]; } for(int i = 0; i < 100; i++) printf("%02x ", decrypted[i]); }``````
 And there are obvious bugs in that code, for example: pos_a = key[pos_b] % key_size; probably should have been: pos_a = key[pos_b % key_size] % key_size;And so on...
 Exactly. Unless I misunderstood the code, If the key size is small (compared to the plain text size) and we know the key length, frequency analysis will work just fine.
 How would you go about this? Obviously the key is smaller then the message (otherwise we would use one time pad and be done with it) The key needs to be large enough that A, it cant be guessed, and B so that the branches of possible positions in it quickly becomes unmanageable.
 It sounds like you might enjoy the Matasano Crypto Challenges: http://www.matasano.com/articles/crypto-challenges/
 Well your key doesn't have much entropy to begin with....On a more serious note, the code you written uses a 8bit key. you need to change the line: (key[pos_c] << 31) | (key[pos_c] >> 1); to: (key[pos_c] << 7) | (key[pos_c] >> 1);The algorithm can obviously also be modified to use 64bit chunks that could be faster on 64 bit hardware.
 This is a classic stream cipher. A stream of bits is generated and xored with the plaintext. In code size RC4 is of very similar size, but work on bytes rather than 32 bit words. RC4 has been used in a wide number of areas, like SSL/TLS. It is now known to have problems, but it was belived to be secure for many years, but now some quite bad attacks are known (and much harder to break than this algorithm)I see one very big problem with the algorithm. If I know the first bytes encrypted (a known plaintext attack), I will recover the xor of two parts of the raw key. On the next word encrypted, it will be a shifted version of the raw key material. This will eventually mix, but the mixing is slow enough that the key can be recovered. Even ciphertext only attacks will be possible if you know or guess underlying statistical properties of the plaintext, like that this is english text, HTTP headers or whatever. In general, too much of the key material leak into the keystream, making the cipher breakable. In practice, the length of an HTTP header will likely leak enough of material to break it.A better streamcipher would have a key schedule before the encryption, making a table of bits that are mangled based on the key bits so that there is less correlation between the keystream and the key instead of using the key directly. Any such correlation will make the cipher breakable.Furthermore, there must be more mixing of bits than the given algorithm has. RC4 appear to have no more mixing, but that uses a much bigger table of 256 bytes, and not only the size of the key like this one, that makes it very likely that a byte has been mixed quite a lot the next time it is used.
 "I will recover the xor of two parts of the raw key."Yes, but how is this useful since the relationship between these two are no longer valid before the next operation? the shift is slow, but the XOR destroying one of the two components that have a known relationship is instant.
 They are extremely important because you can use it to construct an attack that let you narrow down the number of guesses you have to do to recover the original key, so that it becomes feasible to crack the code. In general, leaking information about the key like this is almost a cardinal sin in cipher design, as it will give an opportunity for an attack. Modern ciphers will leak practically no information about the key.
 True, if you want to brute force the algorithem you loose some bits by knowing that somewhere in the key there are 2 values that together become an xor. The question for me is if this information compounds over time. My thought was that it wouldnt since the key is modified.
 This is a really cool idea, I like it a lot.With that said, I'm not convinced that your pseudocode actually works. Specifically, the following doesn't make sense to me:pos_a = key[0];pos_b = key[1];pos_c = key[2];...pos_a = key[pos_b] % key_size;It would be really cool if you posted functioning encrypt() and decrypt() methods to github. Something that people can actually compile/run/analyze. In fact, if you do that get in touch with me and I'll try to crack it.
 Thanks! Its been driving me crazy to try to crack it. It cant be this easy right.....?I thought a lot about the possible starting point for the algorithm, but i found having it just be "unknown" like this was all that was needed. A test to make sure, pos_a, pos_b, pos_c are never the same might be useful. The code obviously needs to be fixed to modulo pos_X never to be larger then the key size.
 As eru said elsewhere, you've created a stream cipher. To be secure, the generated keystream (the thing you xor against the plaintext) must not have any biases: the attacker shouldn't be able to guess the contents of any keystream byte (with P > 1/2^8). Otherwise, an attacker can probabilistically recover parts of the plaintext.I played around with your C implementation a little[1]. I put it into 8-bit mode, and generated 16 random keys and output their keystreams[2]. Here's the first 32 bytes of those keystreams, in hex:0002010203fd050607b0090a0b250d0e0f8b111213a4151617f5191a1be6003e 3b4a00790369050607004b170b820d0e0fb7111213d0005f1796191a1b321d1e 00130102036b05060734090a0b260d0e0fca111264141516178700671b534ae8 df7c806903d10506077fcc670b583600dec617580a2b8b161712191a1b1c00a7 00260102032f1d0082a0090a0bd30d0e0f4ca8c616aa151617a400a34f1c9e1e af7cb175036705060786090a0bc90d0e0f2a111213831516173700501bfa1d1e 000236ca58110506079c00090b860d0e0f8f111251009e1617b4191a1b311d1e 00250102039700ec0729090a0b2a0d0e0f961112133a151617e3a200956764b8 002a0102031e050607ed003c0b260d0e0fcf111213c000151a18191a1bb21d1e 413c0102031505060745090a0b060d0e0fc6111213b815161784191a1b341d1e 00890102039500b807a200570ba70d0e0fa31112134c1516172d191a1ba03900 00cd01020332050607e100b10bdd0d0e0f0d111213261516170e191a1b041d1e 0076010203c30506074e03c7880069000eb8078d54134604a61a142d1a0600c3 00f101020344050607a8090a0b0e0d0e0f11111213e6151617f3191a1b291d1e 00f1007a0369aab10068090a0b53b1ac0fd1111213531516171800ee1bde1d1e 0036010203570506076c00d60b050d0e0f2f27583f03591617bc09471b141d1eAs you can see, there are some significant biases here. For example, as an attacker, once I have a ciphertext, I can guess that bytes 18 and 19 of the keystream were (hex) "1112", and have a very good chance of being right.While the 32-bit version isn't as bad, I think there's still significant biases. I generated 24495 keys. Here's the distribution for keystream byte 1 (script at [3]):`````` 92 82 82 104 99 90 99 111 90 102 91 95 94 91 73 102 109 106 97 99 88 99 90 96 88 97 101 100 108 76 87 87 91 94 111 90 92 104 88 97 94 100 89 102 90 91 92 89 98 96 89 94 111 111 105 90 87 89 93 104 100 110 109 93 77 107 103 84 88 96 89 87 77 96 90 84 87 106 101 98 99 99 114 102 104 106 95 91 95 92 94 104 95 88 93 91 91 78 102 89 104 88 94 100 102 105 94 102 100 105 99 94 87 89 86 93 95 77 82 83 99 94 88 106 106 101 101 91 82 88 98 111 104 93 102 91 87 93 106 89 102 78 88 105 91 93 105 84 101 100 94 93 94 107 88 86 114 84 112 98 97 84 111 87 91 93 89 95 92 96 78 85 90 104 84 80 103 95 98 114 90 89 91 110 89 100 87 107 95 109 83 103 112 102 93 93 87 90 101 91 108 108 90 107 103 95 111 126 88 97 74 111 97 99 95 95 102 97 122 94 106 94 97 103 86 90 102 91 95 108 83 97 91 102 99 90 103 111 93 84 87 107 91 96 77 98 96 85 108 110 116 100 95 89 72 100 `````` I should really get to sleep so I didn't do the statistics on this, but I'm fairly sure that's not a uniform distrubtion (i.e., bytes aren't being drawn uniformly from [0:255]). Also, keystream byte 0 only ever has 128 values, instead of 255.I did not look at 64-bit.All in all, I wouldn't use your cryptosystem :) but there's no shame in that! Crypto is hard! and youre only going to get better.
 noooo it ate my references:2 http://canta.ucsd.edu/~kmowery/quelsolaar/keystreams.tar.gz First line of each file is the key generated by rand(), All other lines are the keystream (encryption of 00 data)3 http://canta.ucsd.edu/~kmowery/quelsolaar/keystreams.py Note that you might have to change to using os.walk to find files; I'm too sleepy...
 I'm not really sure how you can get similar results with completely different keys. I'm looking in to this. I tried generating a meg encrypted zeroes in 8bit mode and the most used 8 bit value was only 1% more common then the least used value.
 Seeing your name again Eskil reminded me of the excellent tooling videos I used to watch for Löve!Where are things at and what are you doing now? I think the whole of HN would be interested!
 In very bad at publicizing the things I do.... This year I have been working on a bunch of new things:I have written a SDL/GLUT replacement: http://www.youtube.com/watch?v=oMJP6vlsmbEAnd a new library to create widgets and UIs: http://www.youtube.com/watch?v=oDulGQnjsDQI have used all this to build a VFX editor for realtime applications: (Ill try to post a video of it soon-ish) http://www.quelsolaar.com/Confuse_particle_system.jpgThat in turn used to make an installation using head tracking last month: http://www.youtube.com/watch?v=UYSVEhSC2DUMy big project this year has been to work on my new RTS game. Its not a huge secret, but hasn't been properly announced yet.
 This looks fishy to me. Follow my thought: If you are looking at the very first value coming out of the stream, that value should have zero bias from the algorithm. Why? because it is an xor of two different values in the key. If both values are properly random then so should their XOR.Note that at this point the algorithm has not yet started modifying itself, so the original key is still intact. The self modifying code may still be broken, but i think we can be fairly sure the first value has the exact same bias as the random number generator.
 so if you know the plaintext you get the user's key?
 Standard with one time pads (and symmetric encryption in general) — the key and plaintext are a shared secret.
 it's not standard with symmetric encryption in general; it's a "known plaintext attack". http://en.wikipedia.org/wiki/Known-plaintext_attack - with aes, for example, even if you know the plaintext there is no known way to obtain the key (faster than brute force search).
 The answer should be no. But I don't know.
 IIRC, the typical random variable is not uniformly distributed, it is normally distributed.
 As everyone said, you have created a stream cipher. But I feel that I have to add that your idea (OTP with small key) is usually implemented this way: the small key is a counter used an input to some PRNG specifically designed for this (you should try AES in CTR mode), which you increment every time.This produces a sequence which is always the same for each session, but because the keyspace is so large (2^128 with AES-128), this is acceptable because these PRNGs are actually cryptographically secure hash functions, which makes them very hard to reverse. Therefore, the only efficient way to crack your session is to either catch the key (your counter) or to brute force it by using every value of the counter and test whether this leads to a successful decryption. Not only are there 2^n possibilities, but every possibility is extremely expensive to test and then you have to accurately check whether the decrypted values are coherent (this is difficult with packets with a fixed header and nearly impossible with raw binary data).The big disadvantage is having to store your small key until decryption.
 I have always been curious why encryption keys are so small. with todays networks/memory 10kbytes key should not be un noticable.
 If a cryptographic algorithm is fundamentally secure, a small (128 - 256 byte) key will be all it needs to be secure. If an algorithm can't be secure with a key that size, giving it a bigger key is likely to only make it marginally better.
 Yes but it removes the entire discussion about when a particular algorithm becomes viable to break brute force.
 No, that's simply not true. A 128 - 256 bit key is already infeasible to brute-force, so long as the algorithm uses the key appropriately.
 > anyone [...] can see an obvious weakness here: the procedural algorithm will produce a pattern that can be found and used to break the key. [...] But what if the algorithm isn’t specific? What if the key describes the algorithm? If the key is data that can be used as an algorithm to produce data, we can create a cycle where the algorithm is self modifying and therefor wont create a pattern.This doesn't seem right. If "the key describes the algorithm" you need another algorithm to generate the algorithm from the key. This other algorithm is unique and will produce patterns like any other deterministic algorithm. Patterns in the generated algorithms will eventually be reflected in the data.
 Creating new algorithms based on the key doesn't change the fact that the input for the encryption function is the key (and in this case apparently the plaintext).You can reduce all this algorithm generation back into essentially one algorithm anyway: generating "changes to the algorithm" merely obfuscates the presentation of the underlying single algorithm. In the end, there's still a fixed set of input information that goes into an encryption function and no matter how you try to obfuscate it Shannon won't budge.
 No algorithm can ever be harder to crack then trying all permutations of the key: in other words brute force. The purpose of any encryption algorithm is to make it as close to impossible to crack it in any less time consuming way. One could argue that my algorithm is better then that because the plain text salts the algorithm, but that doesn't account for any potential guess work, or even an attacker trying to get a user to send a specific stream that would help crack the code.
 Fewer and fewer people. The art world is similar. Artists should be aware what other artists have done, to not do something unoriginal unintentionally. But that's hard work because it requires so much time/energy to investigate. So the hard work of art criticism falls on the non-artist who can dedicate him/herself to the task full-time. My analogy is the critic is the codebreaker and the artist is the code-maker. Artists make great critics because they have inside knowledge of how the moves (decisions) are made. But artists often lack the historical perspective, awareness of emerging work and (while I can't read anyone's mind) they typically lack the vocabulary to discuss their decisions analytically. But splitting up the work allows for more specialization. They're both specialists with tons of overlap but they have different goals. So maybe the cryptography world will make the same split, if it hasn't already? Crypto-artists dedicated to creating complexity and then historian-critic-codebreakers to determine the value (strength?) of what was created. They need each other. One creates, the determines the value of it.Anyway, as in art, "naiveté" (no formal education) can sometimes lead to a surprising breakthrough. While it's rare for an uneducated artist to do something interesting, I believe education hinders creativity in some instances. Though most unschooled artists, most of the time, tend to produce the same cliché results as predecessors, they do have the advantage of always thinking outside the box because the box is invisible to them. So outsiders can get breathtakingly weird results sometimes and part of that weirdness is due to their lack of education. It seems to me cryptography is like art in that originality is a goal. In non-objective art especially, you don't want anything recognizable. Like cryptography, non-objective art conceals the true meaning behind the intent. Sometimes cryptography and art even conceal the intent.So creativity allows for infinite approaches and therefore infinite ways to conceal information. Anyway, I don't plan to become a cryptographer but I think it's a cool analogy (to compare to art) and would be interested in learning more about the "big picture" of cryptography if there are any good books on the subject, even if it's fiction.
 This is a good point. The interesting there here is the raw level of information you need to have to even hope to do something new. From a period, say WWII up to when Shannon was just doing his seminal work, you could think maybe a little on a fairly broad area, and maybe come up with something new. Now, as with many areas of academic thinking, you have to learn so much just to get to a point where you can advance a really small piece of state of the art just a little bit.I think a lot of real math and academic work follows this pattern, and it is the reason one has a right to be skeptical, or even dismissive, of someone doing something like this if they were to present it as a serious work. "Here, try to break this". It is great for someone to learn the true scope of their ignorance, but at the same time, it is also so unlikely you have done even the tiniest novel thing that it is almost always OK to dismiss it out of hand.Cryptography is really just an applied branch of math/information theory. Things that work for advancing the state of art for math are generally the same tools that work for advancing the state of art for cryptography. In the raw, pure and abstract sense. In the practical world, where you have to have real bits on real physical things, the attacks are myriad and the innovators in that space often need minimal training in cryptography (relative to someone doing the theoretical work).
 Did you analyze it for biases in the distribution of bits and bytes?
 I did, but I couldn't find anything. its very evenly distributed even if the plain text just zeroes. Again, I'm not sure some one else cant find any bias just because I couldn't.
 If I feed it a key of all zeros and a text of all zeros the output is a counter, that seems like a problem.
 there are various attacks on the 8-bit version at https://github.com/andrewcooke/Stupid.jlthese include:- plaintext injection- ciphertext collisions- distinguishing attack- statistical plaintext matching- partial known plaintext attack
 Every time Bruce Schneier smiles, an amateur cryptographer dies....
 Is this going to be in that new Game you are making?
 Maybe, if no one can find a glaring problem with it. I think using in in a game would be a good place to test it since the world doesn't fall over if it is broken.
 Why bother? Just use a standard encryption. (Unless you feature your algorithm as a puzzle to be broken. Would make for an interesting spy game. But one that's probably more involved and `work' than most people want from their games.)

Search: