How secure is Internet security? A team of researchers recently set out to crack the security of about 6.6 million sites around the Internet that use a supposedly “unbreakable” cryptosystem. The good news is that they succeeded only .2% of the time. The bad — and rather shocking — news — is that .2% of 6.6 million is almost 13,000. That’s 13,000 sites with effectively no security. The interesting part is what went wrong. It’s a gaping security hole that I’d never stopped to consider, but is obvious once it’s pointed out.
First, a quick primer on how public-key security works. Suppose you want people to be able to send you encrypted messages that only you can read. First you choose two large prime numbers — say 13 and 19. (In practice, you pick prime numbers that are several hundred digits long, but we’ll do the baby case to see what’s going on.) You multiply them together to get 247. Then you announce to the world that 247 is your “public key”.
Now suppose I want to send you a message including my credit card number and a catalogue order. The first thing my browser does is take my entire message and encode it as a single number in some simple way (say by stringing together my credit card number, the catalogue number of the part I’m ordering, my house number, a numerical representation of my street name, etc.) That number will be many many digits long, but for purposes of our simple example, let’s pretend it turns out to be 53. Then my browser does this: It cubes the number 53, getting 148877, divides that result by your public key (247), getting a remainder of 183, and sends that remainder to your browser.
(Yes, this is slightly oversimplified. No, the simplifications don’t matter.)
Now when your browser receives the coded message “183”, it somehow has to reverse the process to figure out that the true message is “53”. It turns out there’s a very quick and easy way to reverse that process — provided you know that 247 is 13 times 19. You do know that, because that’s how you came up with 247 as your public key in the first place. So decoding my message is easy for you. But when the bad guy intercepts the message, hoping to harvest my credit card number, he’s missing the key piece of knowledge that 247 = 13 x 19, so he’s stuck.
(Of course if your public key were really 247, he could succeed by brute force, just trying every possible original message to see which one encrypts to 183. But if your key and my message are hundreds of digits long, this is unthinkably impractical.)
That’s all well and good, provided the bad guy can’t figure out how to factor your public key. We have no proof that he can’t, but lots of good reasons to think that it’s incredibly unlikely. That’s what much internet security is based on.
So in this context “cracking Internet security” means gathering the public keys from a whole lot of sites and trying to factor them. Here’s the researchers’ insight: If I give you a single number and ask you to factor it, that’s really really hard. But if I give you two numbers, tell you that they have a factor in common, and ask you to find that factor, that’s really really easy, using a method invented thousands of years ago and recorded by Euclid.
So. My public key is 247, which I got by multiplying 13 times 19. You have essentially no way of factoring that. (Of course, you can actually factor it easily because it’s such a small number, but we’re pretending it’s huge.) But another site, using similar security, has a public key of 437, which they got by multiplying 19 times 23. Suppose you come along and say: “Hmmm. I know that each of these guys used similar software to generate large primes. Maybe their software gave similar results, so maybe they generated a prime in common. Let’s try using Euclid’s method to find that prime. Bingo! The prime is 19!”
Now you can easily factor my 247 and the other guy’s 437, just by dividing them each by 19.
The ingenuity is recognizing that two hard problems can be a lot easier than one. Factoring 247: near impossible. Factoring 437: near impossible. Factoring them both, given the correct guess that they’ve got a factor in common: Easy.
As long as people keep using the same sort of software in the same sort of way to generate large primes, this is going to keep happening.
Someday, as computing power grows, all of today’s public keys will be factorable directly. It’s been assumed all along that this is no problem; we’ll just go to longer public keys. Whatever the current state of computing power, we can always choose public keys that are beyond its reach. This new research is a warning that if people keep picking keys the way they do now, going to longer ones won’t help very much.
Riffing on Stalin’s “One death is a tragedy; a million deaths is a statistic”, the researchers write that
Factoring one 1024-bit RSA modulus [that’s a fancy name for a multi-hundred digit public key] would be historic; factoring 12720 such moduli is a statistic. The former is still out of reach for the academic community … The latter comes as an unwelcome warning that underscores the difficulty of key generation in the real world.
(Hat tip to Dick Lipton, who has some additional very interesting things to say about the expected likelihood of this problem.)
I’ll give you my prime numbers when you pry them from my cold, dead entropy source.
There’s this blog post (and a few more recent than Feb 15):
http://blog.cryptographyengineering.com/2012/02/rsa-keys-no-insight-whatsoever.html
which suggests that the problem is limited to keys generated by low entropy embedded devices. The problem likely isn’t an issue with the run-of-the-mill SSL certificates on web sites, but with the web management consoles of low-capability devices like routers, web cams, etc.
That blog also has some discussion on the failure modes of some crypto systems when the random number generator is less, uh, robust than one would have assumed:
http://blog.cryptographyengineering.com/2012/03/surviving-bad-rng.html
also heard this on “security now” podcast:
http://twit.tv/show/security-now/340
(starting at about 17th minute)
my browser just complained about the twit.tv link being bad. sorry about that. they’ll probably fix that soon.
You should probably make clear that this is not a fundamental problem in RSA. If we generate primes of ~256 bits for our factors then there are approximately 2^248 primes to choose from. There would need to be 2^124 keys before we would expect to find a collision of prime factors.
@Mike: I think the concern is we all look for primes under the same streetlamp. Like there are zillions of possible names but lots of ‘Mike’s because our name searches are skewed.
Ken B:
I think the concern is we all look for primes under the same streetlamp. Like there are zillions of possible names but lots of ‘Mike’s because our name searches are skewed.
Yes, exactly.
Know whatcha mean….
Mike, it is a problem with RSA in the sense that there is no practical way to validate the key generation. Non-RSA public keys do not the problem of the above article.
Ingenious! I’m impressed.
what is the method invented thousands of years ago and recorded by Euclid?
Kostis: To find the largest factor that 247 and 437 have in common, do this:
Divide the smaller (247) into the larger (437) and note the remainder (190).
Now divide 190 into 247 and note the remainder (57).
Now divide 57 into 190 and note the remainder (19).
Now you’re done. (You know you’re done because dividing 19 into 57 leaves no remainder.)
Euclid’s Algorithm has been of paramount importance in non-calculus mathematics, so much so the a famous mathematician wrote a dandy little book for non-mathies on the topic
D Littlewood, The Skeleton Key Of Mathematics
available cheap from Dover and highly recommended. Suitable for anyone is university.
Great post, Steve_Landsburg. You’ve explained the cryptographic issues very accessibly.
I want to (re)emphasize/(re)clarify that this is not a flaw with RSA per se, as the prime number frequency is large enough that a good random number generator avoids this problem, and such a RNG is a pre-requisite to any cryptosystem. What’s different about RSA is that it amplifies the severity of poor random number generation, as the collision probability increases with the number of pairs of keys (which is proportional to the square of the number of keys).
Steve,
cjc is right. There’s nothing wrong with RSA. There’s not even anything wrong with the way primes are searched for (Ken B’s suggestion). It’s that the starting place for that search can end up being exactly the same due to a weak entropy source. For the most part, entropy sources are pretty good, but as you mentioned even a relatively small percentage like .2% is fairly large when thinking about the number of secure transactions that need to be done daily. The prime generators are fine; it’s the entropy sources that feed them that need greater care.
Kostis,
It’s the Euclidean algorithm.
There is something wrong with RSA. If you buy a router that generates RSA keys, then there is no way you can be sure that you do not have this problem.
Some higher security applications switched away from RSA about 20 years ago for this reason.
Roger,
When you say there is “something wrong with RSA”, do you mean the algorithm or the company? It’s unclear from you post.
When I said there is “nothing wrong with RSA”, I meant the algorithm, not the company.
If you meant there is something wrong with the algorithm, will you elaborate? As I’ve stated above, it looks like the problem is the way the primes are generated. The RSA algorithm doesn’t detail how this should be done, it only stipulates that two random primes are needed.
The most ingenious part of this post, is the clarity of the explanation. Bravo.
The question that I have is this, what are the odds, that two sites would choose two common factors by (truly) random chance? E.g. even if the entropy of picking the random primes was increased, wouldn’t you expect amongst 6.6 million sites, at least some of them picked the same common factor? And if that happened, isn’t it relatively easy to find out which ones did just by applying euclid 6.6million squared times?
How many primes are there in 2^1024? Suppose that answer is X. It would seem that the odds of 2 out of 6.6million sites picking the same prime is 6.6million / X.
Hmmm…
http://homes.cerias.purdue.edu/~ssw/shortage.pdf
This seems to suggest that the number of primes in 2^1024 is sufficiently huge that 6.6million divided by it is much smaller than the 0.2% that they found. So I guess I answered my own question.
The probability of collision is a variation on the birthday problem.
The run time of the algorithm goes up as the square of the number of keys you are testing.
LOL. 21st century technology undermined by a 2300 year old (or older) algorithm.
dullgeek:
The question that I have is this, what are the odds, that two sites would choose two common factors by (truly) random chance?
The answer, as you seem to have discovered already, is practically zero.
Almost exactly how the German Enigma code was cracked in WWII.
Ken, I don’t know what “algorithm” distinction you are making. Yes, as a mathematical algorithm, RSA does what it says. But use of RSA has the security weakness shown in the above paper. Some internet connections could be insecure because of the weakness. Some people have been avoiding RSA for that reason.
Roger,
I mainly wanted to clarify whether you thought the products provide by the company RSA were bad or if you thought the RSA algorithm were bad. I now understand that you think the RSA algorithm is bad. But this is mistaken. The key generation and management is not part of the RSA algorithm. The RSA algorithm assumes two random primes. It is on the user of RSA to implement a prime generator.
This is a key distinction and different from saying that RSA itself is vulnerable. Claiming that RSA is bad because of this is like claiming AES is insecure because people generate the same key over and over.
Ken, you seem to want to define the problem away with your peculiar definition of “RSA algorithm”. It is like saying that my Toyota isn’t bad because it is just the brake pedal that doesn’t work. Or President Obama isn’t bad, it is just that his policies have failed. Or that the new John Carter movie is not bad; it is just the plot and acting that are bad. Whatever you want to call it, the above paper shows a defect to using RSA, and there are alternatives to RSA that do not have the defect.
@Roger: “It is like saying that my Toyota isn’t bad because it is just the brake pedal that doesn’t work”
No, not quite. The next post is about the difficulties in getting an analogy right, so this might be a good excercise. Since the issue is with key generation, the keys being inputs to the RSA algorithms, we need an analogy with something external to your Toyota. For instance, if you poured moonshine into the gas tank, then complained to Toyota it wasn’t running well.
Key generation is not external to RSA. The browser you are using to read this probably has an RSA module, and it does key generation.
But following your analogy, if the commonly-sold gasoline worked fine in all the other cars, but sometimes ruined a Toyota, then yes, I would say that something is wrong with a Toyota. Maybe it is really the fault of the gasoline refiner, but the practical effect is that if you drive a Toyota then you will have a problem.
Roger,
By using your analogy I can claim that Schlage locks are insecure because I made thousands of copies of the key to my house and left them lying around marked with the address of my house.
The browser you are using to read this probably has an RSA module, and it does key generation.
Where does the browser get its entropy? The RSA module certainly does NOT generate this. It gets the entropy from an external source. If it was completely internal, then the RNG in the module would be completely deterministic and easy to predict.
As for your Toyota analogy it doesn’t wash either. The actual analogy is that you bought a diesel engine Toyota, but put gas in it because you put gas in all the other Toyotas you’ve used. Of course this would fail because machines, as algorithms, are sensitive to inputs.
What I’m curious about is this. Why did Steve factor 57? It’s a well-known prime.
Ken B
I hope your joking, since 57 = 3*19.
@Ken: Inside joke. http://mfluch.wordpress.com/2007/12/26/57-the-grothendieck-prime/ G is Steve’s mathematical hero.
Ha! I’ve never heard that story.
@Ken:
I’m heartbroken. You queered the prettiest, most elegant pwn-bait I’ve laid out all year! The plan was for one of the fish to bite, and then really go to town, stretching his powers beyond the usual ‘idiot’ and ‘moron’ to ‘fool’ and perhaps even as far as ‘innumerate’. You can guess the rest: serious pwnage. Alas once you fell into the sanre I had to practice catch&release.
I’m sorry to have snared you but don’t let it happen again
:)
Ken, I don’t get your point. If you still don’t believe that RSA has a problem, then read the above paper. The paper is correct.
Roger,
The point is that RSA is an encryption algorithm that assumes the use of two random primes p and q. Generating p and q is not a part of RSA. I have read the paper. The paper states the problem is the entropy source used to generate p and q. If RSA assumes random primes and you use at least one non-random prime, then RSA is not the problem because you’ve changed the assumptions.
Saying that problems generating p and q (because you often times produce the same p or q) is a problem with RSA is like saying there is a problem with the security of the lock on your front door after you’ve made a bunch of copies of the key to the lock in your front door and handed them out to a bunch of strangers. Obviously if you make a bunch of copies of the key to your front door the problem is you and the way you managed the key to the front door, not the lock itself.
The solution to the problems detailed in the paper is not to scrap RSA, but to be more careful about the prime generation, particularly in systems that have weak entropy sources.
As the saying goes garbage in garbage out. Using bad inputs in to anything degrades performance. To say there is something wrong with RSA because you’ve chosen poor inputs is not correct. It’s like putting gasoline into a diesel engine then saying the engine is defective, when it fails to work as expected and performance degrades.
Ken B
I’ll try to be more careful of your snares:-)
Ken, your hair-splitting definitions are contrary to common usage, and irrelevant. Define RSA however you want, but the paper presents a weakness that makes RSA keys “significantly riskier” than Diffie-Hellman type keys. That is what the paper says. If you don’t agree, then tell me where the paper is wrong. If you just want to promote your peculiar definitions, I don’t care, because I prefer to use the common definitions.
Roger
You can call it hair splitting all you want, but being unable or unwilling to distinguish between independent, discrete steps is very important in security work, and math in general. Lumping everything all together and saying that because of a flaw in one of the independent, discrete steps nullifies everything is simply lazy.
The weakness found for RSA in this paper has nothing to do with being able to decipher c where c = E(m,k), m is the message, k is the key (for RSA k = (p-1) * (q-1), and E(.,.) is the encryption algorithm (RSA) when p and q are generated in a cryptographically secure way. The whole point of the paper is to show that given a weak entropy source, it’s easy to discover what p or q is, thus making RSA useless.
But this is saying nothing more that if you use a predictable key for any encryption algorithm, that algorithm is useless. That is the point. You want to claim that RSA is a problem, when the reality is that key generation is a problem. Being unable to distinguish between key generation and the encryption algorithm leads you to faulty conclusions, none of which are made in this paper.
I see nothing this paper to indicate that they think there is a weakness with RSA not pertaining to poor prime generation leading to key discovery. Do you? In fact, in the “Discussion” section on page 9, they talk about how the proper seeding makes for the RNG that produces the random primes makes RSA cryptographically secure.
I am glad to see you acknowledge that the paper found a weakness in RSA. A cryptosystem is like a chain that it is only as strong as its weakest link. I am only claiming the same RSA weakness that the paper describes.
I am glad to see you acknowledge that the paper found a weakness in RSA.
I didn’t acknowledge this, nor did the paper find one.
A cryptosystem is like a chain that it is only as strong as its weakest link.
The entropy source is not in the RSA chain, but the entire digital security chain. RSA is but a small like in that chain, independent of the entropy source.
I am only claiming the same RSA weakness that the paper describes.
And again, the paper didn’t claim a weakness in RSA, but a weakness in the entropy source.
Ken, your message says, “The weakness found for RSA in this paper”. Now you deny it. A lot of people have been in denial about this weakness for a long time. It goes back to a rivalry between MIT and Stanford. I guess you are in the MIT camp.
Ken and Roger: I think you’re spending a lot of energy on this argument, given that you both appear to agree on everything substantive. You both agree that given good random key generation, RSA appears to be secure, and you both agree that in practice, the key generation that people tend to use with RSA creates a security hazard. Your only disagreement seems to be about whether the latter should properly be termed a problem with RSA, but I don’t see where anything important hinges on the resolution of that disagreement.
Feel free to tell me that I’ve missed the point, but if the above is an accurate summary, I think it might be time to let this go.
Incidentally, the argument about whether this is a weakness of RSA is a distraction from the big issue here. The fact that a lot of small devices have serious problems with entropy generation is not in any way limited to RSA. It just happens to be peculiarly visible through shared prime factors if you generate your keys in a particular way (see https://freedom-to-tinker.com/blog/nadiah/new-research-theres-no-need-panic-over-factorable-keys-just-mind-your-ps-and-qs). In this unusual case, the difficulty is easier to exploit. However, all cryptographic keys generated with low entropy are dangerous, no matter what cryptosystem is used: the difficulty of a brute force attack is determined by entropy, not key length. The only solution is to ensure that every device that generates important keys has enough entropy. (Fortunately, PCs and servers have plenty of entropy, so the problem seems to be limited to embedded devices and other highly constrained environments. However, even though these usually aren’t the most important keys, it’s still a big deal.)
Another great example of why every computer worth anything needs a good hot cup of tea.