I wish I knew more about encryption. Unfortunately it isn't a very accessible area, even if you have a degree in CS you aren't really qualified to even touch the stuff, you need a maths degree at the very least.
Plus every time anyone brings up the subject the answer is always the same: "If you roll your own crypto you're an idiot." Which is not very encouraging. Likely why crypto' libraries are so antiquated and terrible.
"If you roll your own crypto you're an idiot" is technically correct, but discouraging in all the wrong ways and worse for everyone in the long term. I've been a teaching assistant for MIT's introductory graduate crypto class, and I still don't consider myself qualified to implement core crypto algorithms. But I've seen -- and used -- entirely too many crypto implementations that are obviously poorly written in my eyes.
I would love to see a solution to the fearmongering that still keeps people aware of their mistakes. Perhaps public development is the answer -- many eyes make all bugs shallow, so all we need is more people who feel like their eyes are qualified. How many people have actually read the OpenSSL or gnutls or GPG source with an eye to its security?
Probably the enitire community of security researchers did review OpenSSL, gnutls or GPG, maybe only once, but they did. (Didn't you?)
Anyway, pulic development was the anwer that I'd give here. Keep in mind that even the researchers often make mistakes when they implement the algorithms, and the main reason that their implementation is better than the GP's or mine is not because they know more (altough they do know more), but because there are knowledgeable people doing QA on it.
Anyway, you are idiot only if you use your own imlementation for anything serious. There is no problem in rolling your own crypto if you only do it for fun.
Doing crypto right is really, really, really hard. Not only do you get to fight against advanced math, but if you get even the tiniest detail wrong somewhere, leak even a little bit of state, even if what you leak doesn't even look like state, someone will hang you with it.
Such as, does the length of time taken to de/encrypt ever vary based on the internal state? And in your answer, do consider all the interesting ways in which the microarchitecture details like branch prediction or TLBs of the CPU you run the code on affect your code.
Time and time again, organizations with lots of money have decided to roll their own crypto to solve a problem they have. And it has almost inevitably failed.
I took Dan Boneh's "Cryptography I" course this year on coursera and I learnt a lot. The mathematical basics are covered (with references to books and free resources if necessary) and the concepts are very well explained. I would definitely recommend taking it. There isn't a new offering planned yet (Crypto II will start in January), but I think it will be proposed again soon considering that they had two runs of the course (spring and summer) in 2012.
Crypto 1 just "ended". The final exam came out today.
It's been good, as in I never could have followed this thread before it, but I don't know about the first run, but there has been little to no professor interaction with this run of the course.
Which is a bit unfortunate. I don't have any education in modular arithmetic and I can only get half of those questions correct. Being completely lost as to why the other half are wrong, and being unable to get help or worked out answers (to be able to figure out where I'm going wrong before the final) really isn't the best.
Unfortunately I think this may be what we can expect from Coursera. If you're ready for it then it's great. (I've put Boneh's Crypto 1 in the top ten of courses I ever took.) But if you're not familiar with modular arithmetic then you're definitely not familiar with abstract algebra, in which case yeah you need to study that first.
(I didn't realize there was a fall run!)
As for modular arithmetic did you check out the free book by Victor Shoup (it's mentioned on the course site) ?
As for professor interaction, it is a byproduct of the MOOC formula : don't expect too much from the professor, he is already teaching to thousands of students ! You'll have to manage your way through the discussion forums (who unfortunately have a lot of noise). During all the courses I took, I always managed to find answers like that.
Thanks for the link, for some reason I thought that book was on a different topic. I've found a series of lectures from Harvard Extension using the Artin Algebra book. I'm going to use that (and the new edition of the book) to pick up on the concepts I'm missing (over the course of the next few months)
The previous course I took with CourseRA had a decent amount of Prof and TA interaction, and the forums were well tended. But, I guess that could be down to the fact that the course was being used by the professor to instruct his actual University students at the same time.
By far the simplest encryption to understand is the factorization problem.
If you have two prime number it's trivial to multiply them and get the result. Or if you have the result and one prime number, it's also trivial to divide and get the second prime number. But if you have that result, figuring out which two prime numbers divide it requires you to check every single one of them - which takes much longer.
Now imagine your password is one of the prime numbers, and your secret message is converted to a different prime number. Multiply your password with the secret message and you have your result. That's the encrypted message.
Without knowing what the password was, there is no feasible way to figure out what the message was - you would have to divide by every prime number looking for another prime number that evenly divides into the encrypted message. (Technically once you do that you don't know if that prime number was the message or the password.)
This is the core of encryption - the trouble is the side-details can mess you up, and they are really not obvious. For example: How do you securely convert a message into a prime number? Or a password into a prime number?
You can't use the entire message as one enormous prime number, that's not feasible, so you have to break it up. How do you do that properly, without leaking information?
There are other issues with the math (I'm not familiar enough with encryption to even know what they are, but they are things like certain primes have patterns that make themself obvious in the result). See http://en.wikipedia.org/wiki/Semiprime for some primes patterns to avoid.
Then there are issues with the implementation - if you are on a multi user system (or a remote system) you want to make sure people can't figure out the password by watching which system resources you use, or how long things take you.
And then issues with the algorithm: If you encrypt two documents with the same password it can be possible to compare them and figure out what you used. So you want to add some randomization to the password to make that impossible. But getting good random numbers isn't always easy. And you have to store that random number in the message so it can be reversed - but watch out that you didn't do that in a way that negates the advantage of the random number.
This system is (almost) just a variant of the one-time pad.
One-time pad: before Alice and Bob need to send secret messages to one another, they generate a large blob of random bits and each take one copy of it. They keep these bits very secret. When one needs to send an N-bit message to the other, s/he takes the next N bits of the blob and XORs them with the message. This is trivial to do and provably impossible to break -- except that making and sharing and keeping all that key data (the big blob of bits) is really inconvenient and easy to get wrong.
So, anyway, your system has the same weaknesses as any other one-time pad: you need new key data for every message you send, which means you have to make a lot of key data, share it with your partner in crime, and remember it for a long time, all without leaking any of it.
It's not quite the same as a one-time pad. You can get by with fewer bits of key than bits of message, provided there's still "enough" key. On the other hand, "enough" is quite a lot, much more than the key length of a typical symmetric cryptosystem like AES.
Anyway, I don't think this counts as "by far the simplest encryption to understand". A standard one-time pad is simpler and stronger, and the extra complexity of multiplying large prime numbers doesn't really buy you anything much to compensate.
I don't intend for anyone to actually implement this. It's just to understand how encryption can work. i.e. how it's possible for an operation in one direction to be quick and easy, but the reverse operation is hard and slow. (And why it's hard to implement properly.)
But you are right that as described it is a lot like a one time pad. (Especially since if you encrypt a known message you can instantly find the password.)
Many, but not all, of those algorithms will work for numbers which are a product of two roughly even sized primes (depending on size). Especially the Quadratic Sieve and Number Field Sieve class of algorithms (NFS, GNFS, SNFS, etc).
Size is key though. A rough guide of the GNFS times for a quad core PC would give:-
(c100 means a 100 digit composite number, doesn't matter if it is just the sum of just two evenly sized primes).
For comparative sizing:-
2^256 =~ c80
2^512 =~ c160
2^1024 =~ c320
So a 512-bit composite can be quite reliably factorised in a few months by a commodity PC. Parallelisation is a partial help, the first stage of the algorithm can be distributed, but the final stage(s) (~25% of the work) still requires a single machine to do the work.
Part of the hard work in choosing random primes to use within such encryption algorithms is avoiding numbers which are susceptible to each of those algorithms, e.g. avoiding primes with simple p-1 factorisations to avoid being susceptible to Pollard's P-1 algorithm.
The wrong choice of prime and the factorisation becomes (relatively) trivial.
Each prime is approximately sqrt of the product, not of size that is sqrt of size of product, the size of primes is approximately half of the size of their product. As they are primes their entropy is slightly lower than their length, but not too much (there are approximately x/ln(x) primes smaller than x, which would lead to entropy of 256bit prime being 248.5bits)
Schneier's Applied Cryptography is a very readable introduction.
If you want to get into the actual mathematics of it, then it is considerably more difficult. Luckily for me I took courses in pure math, RSA and some hashing functions at university, so I had to learn it :-)
As others have said, crypto is properly hard and it's very easy to make cock-ups. So unless you're prepared to do the years of work becoming a proper researcher, don't try and design any algorithms yourself (for fun is ok, but not for real world use).
1. "Rolling your own crypto" meaning "inventing your own algorithms".
If you invent your own algo for creating one-way hashes, performing symmetric encryption etc and are not an active, up-to-date cryptographer ... you will make a mistake.
This is a world for academics and researchers; for open publication of algorithms followed by the open publication of attacks on those algorithms.
2. "Rolling your own" meaning "Assembling known cryptographic primitives into a larger system".
If you assemble well-studied primitives (SHA-2, AES, HMAC) into larger systems ("I've invented a simple messaging system, guys!"), you are probably still mistaken.
But at least you won't be as much of a professional embarrassment (unless you're keeping passwords in plaintext or some such thing).
At this level, you're between academia and industry. Design at this level is mostly about using certain rules of thumb, exercising defence-in-depth and taking inspiration from existing designs that are battle-hardened. For example, when using multiple kinds of cryptographic operation, have different keys for each operation to mitigate potential bit-leaking attacks.
The good news is that you can get professional reviews done. Seek out respected security experts and ask if they are prepared to exchange labour for currency.
There's also "1.5": Implementing well-known algorithms yourself, which, given the reference to poor crypto-lib, I suspect is what the GP referred to. I'm not an authority on the subject, but I believe that if you do this, you risk stuff like leaving keys (or parts of keys) in memory, being susceptible to timing attacks or not using PRNGs properly.
If you reach the point where in-memory values are an issue, you have already lost.
If I can pull the in-memory password value out of your program, I also have access to inserting my own code into your process to prevent any hashing attempts you may make on that memory. Once memory safety comes into play, all bets are off, and all security is irrelevant, as all your security is simply data in memory as well.
The reason we don't store plain text passwords to disk is because stuff on disk tends to be backed up and placed in positions where it has a chance to be read by others. Storing plain text passwords is not actually a security problem in itself if you can guarantee safety of those plain text passwords - but so far, such guarantees tend to be very false.
The danger zobzu referred to was that the pages containing sensitive information might be paged out to disk. If that should happen, you're pretty much out of luck: they're just as vulnerable as anything else on disk.
The only defense against this is to make sure that the sensitive information is never paged out, using either mlock() or just not having any swap on your system.
There's also the class of bug whereby the software starts spitting out stuff from somewhere in memory that was never meant to be displayed, a la the Minus World in Super Mario Bros. Those are much less common in these days of memory protection and high-level languages, but it is still a concern.
> I wish I knew more about encryption. Unfortunately it isn't a very accessible area, even if you have a degree in CS you aren't really qualified to even touch the stuff, you need a maths degree at the very least.
A good Number Theory course will cover much of the basics of the common asymmetric encryption systems. You don't need to do a whole Maths degree on top of a CompSci degree (although I've done exactly that).
But before that, I'd definitely recommend Schneier's Applied Cryptography.
I'm sure many CS graduate programs offer very good courses in cryptography. There are also books and online resources. Good ciphers/cipher schemes have withstood the test of time by large groups of experts and even under these circumstances some weaknesses are occasionally discovered. That's why a single person should not roll their own no matter what their qualifications are. The difference is those with more knowledge understand that.
In my experience, "don't roll your own crypto" primarily means "do not reimplement a well-known algorithm, because you are likely to get it wrong and likely to leak side channels out the wazoo". I think most people have realized by now that inventing their own algorithm is a terrible idea. (Although certainly there are people who haven't realized that!)
I think that most people mean inventing own primitives by "don't roll your own crypto". Side channel attacks on software implementations on software implementations of cryptography are certainly practical, but often not significant for overall security of the whole system (as either there are simpler attack vectors or the system has to be already partially compromised for the attack to be possible).
Amounts of home-grown algorithms (pseudo-"asymmetric" algorithms optimized for simple hardware or small messages), misused algorithms (RC4 without IV, mostly), otherwise secure protocols implemented with parties transposed around (many proprietary RFID and other authentication token protocols authenticate reader against token, not other way around) and plain weirdness in many even newly designed commercial cryptosystems today is still more significant that any kind of sidechannels caused by careless implementation.
Implementing a well known algorithm can be a very good learning experience! I implemented AES-256 for my college project this semester. Learned a whole lot about Cryptography that I never know before. :)
> I'm sure many CS graduate programs offer very good courses in cryptography.
On the topic: I have an interest in crypto and for a while I was looking at grad schools for cryptography. The number that even offer a course is less than impressive, and the number that offer more than one class are shockingly few.