Hacker News new | past | comments | ask | show | jobs | submit login
Falling through the KRACKs (cryptographyengineering.com)
462 points by kjhughes on Oct 17, 2017 | hide | past | web | favorite | 149 comments

> As the KRACK paper points out, there is no formal description of the 802.11i handshake state machine.

This, to me, is one of the biggest problems facing our industry. In first year undergraduate in engineering we learn how to design a state machine. We learn to design one before we learned OOP and circuits and assembly. Yet, how many APIs or Protocols have you come across or has your organization developed where there isn't a state machine described somewhere. It may seem difficult and tedious, but this is how you engineer something rather than merely 'develop' it. We programmers sometimes spend far too much time juggling stuff in our heads and communicating with coworkers in a wonderful game of workplace telephone. If the state machine has become too complex, then we have lost all hope to fully manage it and must accept the risks. Some things just shouldn't be risked!

This, to me, is one of the biggest problems facing our industry. In first year undergraduate in engineering we learn how to design a state machine. We learn to design one before we learned OOP and circuits and assembly.

I think one of the causes is that a lot of CS is taught top-down and not bottom-up, so things like flowcharts, state diagrams, and other similar visual aids are completely skipped in preference to more abstract thinking. I wonder if at least some of this aversion is directly related to the "gotophobia" that started with Dijkstra's famous paper.

It may seem difficult and tedious, but this is how you engineer something rather than merely 'develop' it.

As someone who learned the "old school" way of planning out your code --- yes, even drawing flowcharts and state diagrams --- before writing a single line of it, I can assuredly say that it should be a natural and completely pleasant process; what's "difficult and tedious" is trying to debug and coerce something you (or someone else) wrote without planning into working correctly.

Over the years coworkers have been puzzled at why I spend "so much time" flowcharting and just thinking before writing even a single line of code --- only to try adopting the same method once they realise how much time it saves in total, since all the cases have already been accounted for and the only thing left to do is test out any of the few remaining and often trivial bugs introduced during the design->code process.

Except at the highest level of abstraction, my Lucidchart diagrams and Google docs never survive their first encounter with the compiler. There always turns out to be some fatal flaw that's obvious in an editor but totally escaped me in the word processor. My (limited) experience has been teaching me the opposite: dive in as quickly as I can and write some code to explore the problem space, then go think harder about a design after getting my feet wet.

I once worked on a Finance project which had grand aspirations and decided it was going to Do Things Right by starting with extensive specifications.

The developers insisted on all behaviour being specified and documented before they would start coding. Of course, a truly exacting specification is equivalent to code. So the users who had been tasked with describing the required functionality ended up inventing their own DSL and “coding” the entire system in Microsoft Word.

Needless to say, they were the only ones who understood the spec. IT cried foul. Business cried “this is what you asked for!”

Cue months of arguments over what constituted a proper spec, during which the users essentially taught themselves how to code. The software eventually ended up being written by transliterating the DSL into actual code.

It worked, but remains to this day an unmaintainable mess.

So the documentation was simply incomplete, because the DSL, as a language, was not documented properly. This is why the initial developers were unable to have it maintained by others.

Essentially the same problem, just on a higher level of abstraction.

Oh, and failing to document DSLs is really a common mistake: No docs, no spec, no helpful error messages, and so on. Just like the very early parsers or compilers. But DSLs are really the most important thing to document, as these are so much harder to get into ... the "simpler" parts of the program are usually readable and understandable with less effort, even with pure code and no docs at all.

To those who downvoted: Do you care to elaborate?

What's wrong with demanding to document a DSL?

Well, the DSL flowed out of documenting the problem. Having non-technical people write a DSL, and then proceed to document that DSL, is not really addressing the core problem which was knowledge transferrence. Why do you feel they'd be better able to document a DSL than they were documenting the domain?

The whole purpose of a DSL is to make the whole system simpler, by separation of concerns.

That is, the DSL as a language is still much simpler (fewer data types, fewer elements, fewer rules, etc.) than the whole application, which is built using that DSL.

So I'd expect the DSL's documentation to be short and targeted at the people writing within that DSL.

If the DSL's documentation is more complicated than the original problem (spec), it failed at its purpose.

BTW, ultimately you have a chain of languages, such as:

    Spec <- DSL <- Meta-DSL <- ... <- programing language
But in the end, that chain goes from complex/specific to simpler/generic. So the documentation effort should shrink dramatically from one step to another (or your system design is seriously flawed). Usually, the Meta-DSL or Meta-Meta-DSL is already the plain, generic programming language you are using. And that one requires no documentation writing at all, because is already documented and widely understood.

> It worked


Indeed, usually these approaches are abandoned due to reaching a point where arguments over specifications consume years and nothing happens. The project is killed and everyone involved are reassigned elsewhere, hopefully having learned something, but usually there's no such evidence of learning.

This is truly a great anecdote. I wish I could bookmark comments on HN, because I want to dig out this one when the time comes.

> I wish I could bookmark comments on HN

You see that "X hours ago" to the right of the username? Yeah, click that...

In all fairness, it's not entirely obvious that this is how to do it.

But it's a good tip nonetheless. Once you click the "x minutes/hours/days ago" link, besides bookmarking the resulting URL in your browser or elsewhere, you also have a few more options that show up at the top of the comment page. One of them is to "favorite" the comment. You can also favorite a submission in a similar way.

Just be advised that your favorite submissions and comments lists are public.

> it's not entirely obvious that this is how to do it

I totally agree that this is an example of insane, hidden functionality. How could anyone know that "click timestamp" = "view details"? Makes no sense.

However, seemingly every site/app has adopted this, from Twitter to Facebook to several PM services.

It's the only real 'title' of a comment beyond the user name, and the user name is already a link.

OK, so on HN, it's clickable. But "gray text without underline" isn't always clickable on HN. This is true of other sites: it isn't even clear that the timestamp is clickable at all, let alone what'll happen if you do click it.

Why not acknowledge that this is a problem and just add a little "Permalink" link, which some sites used to have? Or at least a permalink icon?

Surely the combined design talent of Google, Facebook, and Twitter can solve this problem better.

I think hidden is not quite the right word. It's a link, likely displayed in a hypertext browser.

A hypertext system that doesn't treat links as slightly opaque, explaining them in detail wherever they appear, sounds unpleasant to me.

I guess you could probably modify most such systems to have a flashing popup on every link that said "There is more information available if you click on the text here!!!!".

> It worked, but remains to this day an unmaintainable mess.

The question is: is it a worse mess than if you'd have started without the spec? My intuition says "no", and the experience the users gained regarding formal thinking is a huge net benefit that will be an asset for years to come.

I am not a programmer. I have done a lot of programming. Eventually, I was able to hire competent professionals. I have taken very few formal courses in programming and zero in what I'd call computer science. I programmed out of necessity. Computers were pretty useless if you didn't.

So, take this with a grain of salt.

I can't code without a diagram. I make a diagram and process the logic. I make lots of sub diagrams. I make notes and ask myself a lot of questions. I'll also ask other people, if I can.

I can't do it. I can't program 'on the fly.' My attempts have resulted in horrible results, not even qualified to be called results. If I don't process the logic, pretty much from start to finish, I can't do it. I use a notation system that isn't even a real programming language and leave lots of room to cram more stuff in and make corrections.

Most of this is past tense, I don't do much anymore. I'm retired and my efforts are just little automation tasks for my own needs.

Anyhow, I figured I'd offer another perspective/opinion. I envy you if you can do it off the cuff. I've tried, I can't.

I think it depends on thinking style and mental models. Personally I have "diagram blindness"; most diagrams look like visual noise. Usually I get better (fast, reliable, maintainable) results by just coding on the fly and then relentlessly refactoring as the optimal design gradually reveals itself.

I have a CS degree and a few years professional programming

I have coded on-the-fly and I have coded from diagram first approaches.

on-the-fly works for me when there is an obvious data structure to work to, otherwise it takes a lot longer than planning it out because of all the edge cases

Data structures and ADTs are the key in general.

If you consider the diagrams as a physical manifestation of abstractions that can eventually be converted into boolean logic, etc than it's the same thing. For many years I used to draw on paper a mix of non-standard diagrams with some pseudo-code to discuss slightly complex algorithm implementations with peers... the more coding experience you have, the less you need to do this and you usually make the connections just in your brains and start coding.

So my tip here is if you want to learn how to do it on the fly you should start with small steps by reducing the amount of stuff you need to see in a paper before start coding. In the other hand it's perfectly normal and there is no problem in doing what you do.

When I program 'on the fly' I either have the diagram in my head, or I'm working 'linguistically' such that things are correct-by-construction / facts about objects are true because-of-what-they-are rather than because of what they do. For example, using std::shared_ptr to manage a resource instead of trying to reason out all the cases to manually free the resource. I guess what I'm saying is that rather than thinking about all the cases to see that the code does work, I try to write code in a way that it is not possible that it does not work.

>>There always turns out to be some fatal flaw that's obvious in an editor but totally escaped me in the word processor.

This problem goes away once you gain some experience - both in writing specs and in programming.

Think of it this way: do engineers build bridges without design documents and diagrams? No. So why are you writing software without them? If you aren't familiar with the problem domain, fine, do it the agile way. But once you gain experience there is no excuse to not thinking problems through in advance before trying to solve them.

It's faster and more pleasant to write working code than to write prose descriptions of hypothetical code.

Documents are cheaper to create and easier to change than bridges. Code is just another kind of document.

I've a sneaking suspicion that bridge building would look more like software if we could drive over AutoCAD models. (Or that software would look more like bridge building if we were still compiling by hand).

EDIT: I should add that I certainly do write and communicate documents about "what business problem does this even solve" and "what does it do" and "what are the major components and what are they responsible for" and "did you consider X alternative design" but planning down to the level of function signatures has usually turned out to be a waste of time.

>>Documents are cheaper to create and easier to change than bridges. Code is just another kind of document.

That really depends on the type of code you are writting.

Writing code going into a space craft, or Nuclear Power Plant, or Encryption code that can be relied on by people to protect their actual life, or code going into a medical device or.... you get the picture

Yes if your writing code going into the next Pokemon time waste game sure you can be fast and lose, but to not presume all code is the same.

In this instance, code is going into devices that historically do not get updated properly, I think that warrants taking a more "bridge building" approach and less "Code is just a document" approach

I think for that argument to work, you'd have to show that the word processing activity actually yields fewer errors in the resulting software. I guess what I'm saying here is that I've found writing prose beforehand to have low to zero ROI in terms of resulting software quality. I totally believe that writing math and actually proving things about your ideas would.

The more important code is, the more I want to see it subjected to a realistic and comprehensive test suite, careful peer review, etc. Although you're right, I ultimately work on backend systems that can cause some downtime at worst, not industrial process control or crypto libraries. I also have the luxury of doing continuous delivery, incremental rollouts, feature flags, staging environments with shadow traffic, etc. so there are a lot of safeguards standing between a mistake and widespread impact.

This is not specifically only a reply to you, but it seems like the two camps of "detailed specification on paper/word processor" and "code on the fly, because the code is a document" are missing a third alternative, which is TDD (ie, the tests are the specification), and a fourth alternative, which is literate programming (the code and specification are written together).

One can also use executable models, either in same language as the implementation, another general language with nice libraries or a DSL. For instance finite state machines

> I've a sneaking suspicion that bridge building would look more like software if we could drive over AutoCAD models.

Why on earth would you want bridge construction to be more like software engineering? Bridges work. We know how to build them bug free, every time. We don't know how to make working, bug-free software. There's even an old, not very funny joke about exactly this comparison. (Spoiler: it doesn't flatter the code monkeys.)

You might want to look up the Cynefin model[1]. Bridges are complicated, software is complex. This supports your bent towards experimentation, sort of; when solutions are unknown, that's required. But documentation and planning are part of what turns complex in to complicated, and that's a big part of the trick to successfully delivering big chunks of relatively bug-free code.

[1] https://en.wikipedia.org/wiki/Cynefin_framework

We have the mathematical tools describe everything that might happen to a bridge and determine how it will respond. Obviously that’s immensely valuable. Such tools are not available for handwavy descriptions of software in Word.

Your last sentence is actually kinda close to relevant to what I was talking about, but for the wrong reason. I suspect as your career progresses, you'll figure it out - most engineers do.

Yeah, someone pointed this out to me once, and it is an interesting spin on the comparison of software engineering to other types of engineering. More succinctly:

We don't build our "bridges", our compilers/interpreters do.

The engineers don't build the things either, the construction workers do.

Sure, but it’s the construction work that you don’t start until you have exhaustive documentation.

Code is just a detailed specification. A better analogy is that you start with designing the nuts and bolts without knowing how the beige is supposed to look.

Programming is the design. Building is performed by a compiler (free part).

Also, flowcharts are less readable than most programming languages in non-trivial cases. Graphical programming languages are mostly useful as DSLs. Text is unbeatable in the general case.

> This problem goes away once you gain some experience

As will the need for planning beforehand.

People are very different it seems.

I enjoy both diagramming and early exploratory coding.

I have real problems when one of my (really smart) colleagues wants to discuss some code we have neither written nor diagrammed yet.

For her however that seems to be the natural way of doing it and when we started working together I had to point out that just didn't work for me.

It seems to me that knowing and being aware and accepting of the different learning styles can solve a number of problems in teams.

Thinking in state machines is not the same as design up front. Most languages lack a good way to describe state machines, but there are many libraries which do this and even produce the diagrams for you (which can be very helpful when trying to understand the code). I frequently write state machines in an incremental manner.

Enums (or named constants if you don't have enums) and switch statements (or if you don't have these, make a hashmap of state->method or something) are all you need. I'm not sure I get what you mean.

One of the primary features of state machines is that state transitions are well-defined. "Describing state machines", as the GP said, is typically done with tools like a reachability matrix or truth table. A looped switch statement will allow you to implement a state machine, but not describe it.

In other words, how would you express in code: "state Z can only be entered through either state W or X, but not through state Y", or "state V must be followed by state W, but only after condition A"? Sure, you can write a switch statement. But half the rules of a state machine can only be implemented by omission (i.e. by not coding it, instead of hardcoding the requirements).

Isn't this where formal specification languages like TLA+ are useful?

I sometimes describe TLA+ to people as "unit-testable flowcharts." It's pretty inaccurate but kinda gets the idea across?


You do realize how bizarre it is to blame lack of rigor in CS on the author of "On The Cruelty Of Really Teaching Computer Science", right?


Do you keep your documents up to date? I will occasionally sit down and diagram how I want everything to work usually in a simplified UML diagram, an FSM diagram, or a sequence diagram. It certainly makes initial development faster.

But then the changes start and now I either discard the diagrams or accept that I have two things to maintain. Generally I take the former path because divergence from the plan is almost inevitable on anything that's alive.

I usually hack out something fast to understand the issues (there will always be issues) and reiterate by modifying old code to the new func/api which allows me to find corner cases.

It also takes all the "fun" out of programming, as so many seems to approach it like some grand display of mental prowess. Never mind the whole "agile" thing...

You can look at this both ways actually:

For some people (like me) thinking through problems with pen and paper is a tool.

For others I guess it is annoying: look at him, now he's doing this wannabe software architect stuff again.

Same might go the other way: here he goes again, trying to prove his programming super skills by jumping into it without even the smalles amount of planning.

FWIW I like to jump in and start coding, but what’s happening behind the scenes is that I “see” the algorithm or architecture in my mind. It’s not exactly a foowchart or a state machine, but it does have a sort of plastic manipulatable shape.

FWIW I don't think other people who jump in and begin coding are doing so blindly either. i.e. they do the same thing you do.

Everyone was presenting it as a dichotomy in this thread \_(ツ)_/¯

Seemed relevant to point out that at least for me it’s not.

As someone who does both (flowcharts etc and exploratory coding) I think I experience something similar.

For me it is not very visual when I code. It is more a gut feeling: this goes here, that goes there. This is good, bad, weird but works etc.

This is nice for me but I know I have to work on how to explain it to others.

Bad example but I guess this is somewhat closer to what I want to be able to explain: "it currently works but it is a bad idea because it means you'll have to remember to patch seemingly unrelated functions x, y and z whenever you update this.

Also, in this case it is even worse as in the worst case you won't see the errors until the quarterly reporting run and that code is well known for having bad test coverage."

Edit: as usual I'm curious as to why parent was downvoted?

pseudocode and application logic is absolutely necessary. Visualization is cumbersome and unnecessary except in large codebases.

I find it interesting how in a large codebase or project there is a temptation to mutate an object here, decorate some intermediate object there, and before you know it you have an ugly state machine hiding in your code. As you know the number of possible states grows very fast and when people don't know this is happening things soon get out of control. Having things immutable by default helps somewhat, but it's still easy to add various decorators.

> there is no formal description of the 802.11i handshake state machine.

Here is what the paper actually says:

The 802.11i amendment does not contain a formal state machine describing how the supplicant must implement the 4-way handshake. ... Fortunately, 802.11r ... does provide a detailed state machine of the supplicant.

The problem, as I understand it, was that when the protocol was formally verified, the properties checked were only the escape of the private key and identity related issues. The property of the nonce never being reset was simply not considered.

The missing piece, then, wasn't a precise description of how the system works, but rather a precise list of what it's supposed to guarantee. Without a clear and concise description of the requirements, it's hard to verify that the system satisfies them. Alternatively, it's possible that the list of guarantees was complete, in which case reviewers should have noticed that no nonce reset isn't one of them, yet it's required for the security of certain communication protocols used over the channel.

I guess a related question would be, is anyone aware of a common protocol that does have a formally defined state machine to review?

When we designed DCCP, I did a formal model using LTSA to prove liveness and safety properties. Mostly I did it for fun (I got landed with teaching concurrency, and the previous professor had used this tool). As formal models go, LTSA is rather simple. I was pretty surprised when it found a previously unknown deadlock state. To my chagrin, a decade or so earlier, I'd observed a very similar deadlock in the Solaris TCP stack when running SMTP, which is unusual in that it requires the server to send first.

I shared the LTSA model with a few others at the time, but no-one else seemed very interested in doing anything with it.



Edit: I dug through my old backups, and I think this is probably the model I used:


It's nothing fancy, but my point is that even simple modelling can spot errors that a whole standards group of smart people miss.

TCP's might not be formally defined (unsure), but it's pretty well understood to the point of having de facto ones, at least.

There are state diagrams in the TCP RFC --- in ASCII art, so the authors clearly really wanted to have one.

Nice! Good to know.

That must be the origin of all the useful but unofficial SVG ones I'd seen :)

HTML parsing is, from the HTML 5 spec onwards, defined as a state machine.

JSON, but it isn't that pretty.

JSON is actually not well or consistently specified, and definitely not to the rigour of a state machine; it’s specified in half a dozen places and ways, with inconsistencies and fuzzy details; see http://seriot.ch/parsing_json.php.



Here is really cool work on state machines in TLS: https://www.smacktls.com/smack.pdf

Beurdouche et al., "A Messy State of the Union: Taming the Composite State Machines of TLS"

From the abstract: "mplementations of the Transport Layer Security (TLS) protocol must handle a variety of protocol versions and extensions, authentication modes and key exchange methods, where each combination may prescribe a different message sequence between the client and the server. We address the problem of designing a robust composite state machine that can correctly multiplex between these different protocol modes. We systematically test popular open-source TLS implementations for state machine bugs and discover several critical security vulnerabilities that have lain hidden in these libraries for years ..."

Based on this work you might say that there are formal state machines for TLS, but they aren't in the standard.

Compare front-end [1] design philosophies (such as used in waterfall development) vs iterative design philosophies (such as in agile development). As a non-software, 'real world' engineer (mechanical, heavy industry design) looking in from the periphy of computing as a hobby/amateur interest, it appears to me like software engineering as a discipline has sort of forgotten about why and when FEED is appropriate and, critically, how to implement it when it is.

Let's reflect: iterative design is popular in software engineering because software usually suits it. Changes are easily iterable, you can often roll changes back, there's relatively low cost to cloning changes across millions of instances, code's design lifetime is relatively short, etc. In contrast, 'real world' engineering (civil, heavy mechanical) favours FEED because the cost of change rapidly and prohibitively rises with time, so nailing down the scope and structure early is critical, there's long procurement times for changes, the design lifetime of the finished product is measured in decades, etc.

The rub is that there's a small niche of software design, such as protocol design like this, or certain firmware scenarios, which far more closely resembles the latter even though it's usually done by a community used to and trained in the former. Although it's for different reasons, all the things that make FEED appropriate for 'real world' engineering also apply to these sorts of software tasks, but software engineering as a whole doesn't appear to[2] focus on teaching and supporting the necessary tools and techniques for performing that process well.

Flip that on its head, and it's why software companies quickly begin to run rings around companies traditionally focused on hardware once software starts to play a bigger part and the limitations stopping iterative design from working begin to disappear. Software engineering largely biases towards iteration (understandably) and when it works it allows for good advantages. But in embracing those advantages, software engineering as a discipline has begun to lose an understanding of FEED and the careful methodology it delivers.

[1]: https://en.wikipedia.org/wiki/Front-end_engineering & https://en.wikipedia.org/wiki/Front-end_loading

[2] I admit I've not done a CS/SE degree, but I've gone through a couple of subjects run online and I talk to friends who did do it and that's the feel I get. Besides, just look at HN's generally trending articles.

edit: With all I wrote, I don't mean to undermine the extremely good work that the devs designing these protocols, etc. almost always perform. It's far beyond my ability. I am speaking more in a general sense about how Soft. Eng. seems to have thrown out the baby with the bathwater in declaring agile philosophies as 'the best way'. My feel is that the pendulum seems to have swung too far past 'best' towards becoming 'only'.

> It may seem difficult and tedious, but this is how you engineer something rather than merely 'develop' it.

Don't agile approaches encourage this, focusing on features and things that can cleanly be assigned story points at the expense of a bigger picture, with velocity being the focus?

It seems to me that agile approaches are a response to the phenomena of the spec often being wrong. If the spec is wrong and going to need substantial revision, iteration is better than planning.

If the spec is not wrong, careful planning is probably better than high velocity iteration.

I think cryptography is a good example of this--would you develop a cryptographic API on which people's lives might depend iteratively through a release early and fail often method? I hope not.

Exactly, and this is why Agile (while grand for things like "The customer wants a web site. They don't know what they want or how they want it to look, but we need to start on Tuesday.") isn't the be-all and end-all that it's advertised to be by people who only work on websites for tech-illiterate customers.

In a lot of software development, especially large scale enterprise software (where you're trying to map a complex set of existing business practices and workflows into software) and embedded software (where you have to write something that provides fixed, well-understood functionality running on hardware with a months- or years-long iteration time) you do have a complete and correct spec (or at least the means to generate one). In these cases a less ad-hoc approach than Agile can be beneficial.

I upvoted this, but agile and "release early and fail often" aren't necessarily the same thing. For example, they could have iterated quickly during the draft phase, come up with a proposed implementation, have the security researchers review it, and iterate.

Either engineering and its tools (eg state machines) or just craftmanship.

Totally right.

It is tedious. It’s inefficient. Quality software usually is less important than working software.

Unless I'm misinterpreting the meaning of "usually" or "important", I find this comment a little bewildering in the context of the article.

In the working world, what matters is you ship. The most beautiful code and best "game ever designed" means nothing if it doesn't ship.

Additionally, in my years of working as a contractor, I can tell you that nearly 1 in 1,000 companies (+ or - 1) care about security. Management simply does not value it, and everyone already thinks your product "should have been done yesterday." The entire world runs on insecure products. From banks, to healthcare, to universities. I can list every one of those who has put admin credentials in a publicly, internet accessible text file. (But they'll do silly things like insanely unmemorizable passwords like 23ix2n5 that get written down as postit notes and as files on desktops, and disable RDP copy-and-paste but leave RDP hard drive support.)

We're going to continue seeing these blunders day after day until people start going to jail for losing data.

Or stop treating programming like some grand joust...

We won't. Worse is better, remember? Especially when a typical company wants to be the first to quickly sell some shit product and then, maybe, they'll patch it for a while as they go along. Competitive pressures (and management fads, IMO - software market ain't that efficient yet) turned "good solution today > perfect solution next week" into "steaming pile of garbage today > good solution tomorrow".

I guess this is fine for throwaway acquihire SaaS stuff, but it's becoming a serious problem for stuff that's meant to be widely used.

Consider it descriptive rather than prescriptive.

I've been working in the same code base one way or another for more than 20 years now. Quality is underrated.

Can you think of any particular circumstances where quality would be of paramount importance?

I'll wait.

Pacemaker firmware comes to mind.

More importantly, even after the fact, they’re hard for ordinary security researchers to access. Go ahead and google for the IETF TLS or IPSec specifications — you’ll find detailed protocol documentation at the top of your Google results. Now go try to Google for the 802.11i standards. I wish you luck.

The first result for "802.11i" is the Wiki page for IEEE 802.11i-2004, which mentions that it's incorporated into 802.11-2007, and if you search for "802.11-2007", the first result at the top (after the amusing calculation that 802.11 - 2007 = -1 204.89) is the PDF of the full standard.

...and I'm not even a security researcher. But I agree with the rest of the criticism of the IEEE and the 802.11 standard in particular, once you've acquired it.

One of my pasttimes is seeking out and reading standards for protocols, file formats, and other things; and so I've seen quite a few. As an overall impression: IETF's are very readable, most of the ITU ones are pretty good if somewhat terse, ANSIs and IEEEs vary widely between "a little bit of effort" and "I feel like my brain is melting if I read more than a paragraph an hour", and 3GPPs along with BlueTooth are very much in the "take a 10 minute break to digest after every sentence or two" category. 802.11 is also like that. The acronym density in that document is one of the highest I've ever seen.

Maybe KRACK will at least make them consider editing their standards and formatting them to be somewhat less of a chore to understand; things like https://files.catbox.moe/xym9dq.PNG should really be put into tables.

And then he wrote...

The IEEE has been making a few small steps to ease this problem, but they’re hyper-timid incrementalist bullshit. There’s an IEEE program called GET that allows researchers to access certain standards (including 802.11) for free, but only after they’ve been public for six months — coincidentally, about the same time it takes for vendors to bake them irrevocably into their hardware and software.

In other words, it's getting better, but vendors are burning silicon by the time researchers have the time to even crack open whitepapers -- too late.

Why is there a delay at all? If they’re a security researcher, it would make sense to allow them access as soon as possible.

I think the point is that these standards were not public for a long period of time. I have no idea though.

I don't understand how the attacker can block step 3 of the handshake from the client. Can someone ELI5?

What I was able to understand so far is that the use of a particular nonce is being forced on the client and given that invariant keystream one can decrypt known messages. I'm still fuzzy on how the attacker can force this step.

EDIT: Further question, isn't timestamp synchronization taken into account here when invalidating a nonce? What happens when I disconnect from wifi to a mobile network and back to the wifi?

From what I understood, there are two ways to block step 3. The most obvious way is to jam it, that is, send noise at the same time so the radio fails to receive it. The other way (and the one which makes attacks easier) is to pretend to be the AP but in another channel.

Suppose the real AP is on channel 1. The attacker is nearer the victim than the AP, and is on channel 6. The attacker repeats on channel 6 everything the AP sends on channel 1, and repeats on channel 1 everything the victim sends on channel 6. The victim sees the same AP on both channels, channel 1 and channel 6 (the attacker modifies the part of the AP beacon/responses which says "I'm on channel X"), which is a perfectly legitimate and normal thing, and chooses channel 6 because it has a stronger signal.

Since now the attacker is in the middle, it can modify anything. Change the AP beacon to say "I'm on channel 6", drop any packet it wants, duplicate any packet it wants, and so on.

> What happens when I disconnect from wifi to a mobile network and back to the wifi?

The wifi negotiation starts from scratch, with a new step 1.

So it sounds like if the user knew which channel were legitimate they could potentially realize which AP is the real one?

Two comments on this: 1) The way I read your post was "How can the attacker block step 3?" That's not what the linked webpage is suggesting: "... the acknowledgement to message #3 can be blocked ..."

So really this is blocking step #4 from occurring, pedantic perhaps, but it's sort of important.

2) In response to "How can someone block step #4?" Since we're discussing Wifi then signal jamming seems apt. From the actual paper discussing krack: "Inspired by this observation, an adversary could also selectively jam message 4 ..." There's much literature here (google). The reference the KRACK paper provides is: "Mathy Vanhoef and Frank Piessens. 2014. Advanced Wi-Fi attacks using commodity hardware. In ACSAC" link here: [https://people.cs.kuleuven.be/~mathy.vanhoef/papers/acsac201...]

Awesome, thanks for clarification. I think I don't quite understand how one can selectively jam a particular "data packet" from a signal or signal jamming in general, and I will have to read up on this.

The attacker gets a Man in the Middle position on layer 2 (data link). So every step in the handshake goes through the attacker.

The attacker couldn't just MitM the protocol on layer 3 as it cannot authenticate with the WiFi router. Of course, the attacker already could just pretend to be the wifi router and then forward and inspect all traffic to the internet. What is scary here is that it breaks into a Local network as opposed to the internet.

From TFA

"And separately, to answer a question: how did this attack slip through, despite the fact that the 802.11i handshake was formally proven secure?"

Implies that the formal verification failed.

Yet from the paper on the attack:

"Interestingly, our attacks do not violate the security properties proven in formal analysis of the 4-way and group key handshake."

This is a good thing! One of the hard things for software developers to do is separate the theorem (spec) from the proof (implementation). I can see how the article author would posit such a question.

The paper answers the question.

What do we do about it? More theorems, modelling, and proofs. Research on synthesizing the implementation from its specification. Treat events like this as we treat auto-collisions and correct the processes, tools, and technology that enable attacks like this to occur in our systems... at least that's my 0.02. :)

This whole process is dumb and — in this specific case — probably just cost industry tens of millions of dollars. It should stop.


A pay-for-play standards development organization should charge to _participate_ in the standards-setting activity, and NOT charge the public for access to the standards.

The ITU-T seems to have learned this, at least as to some things (e.g., the ASN.1 family of standards, such as x.680 and x.690). The IEEE should do the same.

Better yet, they should look at the IETF model, which has been working for _decades_. In the IETF model there is no charge at all, except for attending physical meetings. Mailing list participation is free. RFC publication is free. The IETF is financed via meeting fees and donations. Vendors have an interest in the IETF staying afloat, so they see to it -- their interest is, of course, the IETF's overall low cost to them (by comparison to OASIS, IEEE, ANSI/ISO, or the ITU-T), and the amount of review their specifications get.

Maybe 802.x dev should move to the IETF lock stock and barrel, you might say, but there is a problem: the IETF doesn't have a lot of expertise with physical layer protocol specifications. _Most_ IETF participants are at layer 2 and up. Of course, this can be fixed by... just moving all the 802.x work (and the people working on it) to the IETF.

EDIT: Mind you, getting the work done at the IETF would not be a guarantee of no such bugs. It would address the procedural and public access problems TFA talks about. But that alone is a huge improvement over using the IEEE for highly consequential specifications that greatly affect the public.

I recently found the need to understand the 802.11 standards in detail and was shocked to discover how difficult it is to access IEEE standards. I even work for $BIGCORP, with institutional access to various IEEE publications, but 802.11 isn't part of our subscription. I'd venture a guess that much of 802.11's security is by way of obscurity.

> There’s an IEEE program called GET that allows researchers to access certain standards (including 802.11) for free, but only after they’ve been public for six months

Also, that program only allows access to the latest release of each standard. Which means that, when there's a new release of a standard, access to any version of it is closed for six months.

Just to clarify a question I had when reading about KRACK - If you are using a VPN over WiFi, you still have some degree of safety, right? Any way that the KRACK vulnerability can be elevated to sniff and decrypt VPN packets in transit?

A properly working VPN should still maintain privacy even if an attacker is in complete control of your modem. The data is encrypted from your device all the way to a VPN server. Conceivably, they could redirect traffic to an alternative malicious VPN but that should fail authentication.

(somebody let me know if I'm wrong in this)

You are right, but only for the perfect 'VPN'. There are also many insecure VPNs out there, which don't cover all your bases (for ex: VPNs that don't tunnel DNS requests through the VPN is a common mistake)

So while there are VPNs with great security, you have to make sure your VPN is not vulnerable to common MITM attacks. So choose your VPN which has been well vetted. I'm a fan of OpenVPN due to it being able to pass as HTTPS traffic, but a popular recommendation is algo which is IKEv2 based IPSec VPN.


Yes. It appears that since KRACK can allow the attacker to hijack a TCP handshake that they could potentially downgrade an HTTPS connection where HSTS is not used, but that shouldn’t affect your VPN connection.

To be clear, if a client makes an HTTPS request, you cannot downgrade it to HTTP, even if you're in control of the TCP socket. You can only hijack connections that are initially made over HTTP.

In fact, you can, using sslstrip. It was demonstrated in the video, as well as in this article: https://arstechnica.com/information-technology/2017/10/sever....

I think this will only work against android/linux clients, since in that case the attacker actually knows the key, and can perform a proper MITM.

You cannot. sslstrip exists to attempt to work around the fact you can't directly downgrade HTTPS to HTTP.

In that video, because HSTS isn't used on match.com (unfortunate), the browser doesn't attempt to make an HTTPS connection at first. Obviously, if you have control of that TCP connection, you can do whatever you want. The browser is oblivious to HTTPS existing.

Also, be careful not to confuse the layers here. The encryption algorithms in WPA2 are completely unrelated to TLS/HTTPS. TLS is... somewhere in the OSI model, but it's well above WPA2/packet handling, and TLS can/does work with a compromised network.

I love the image with the kitchen drawers ... It passed the linter, it passed the type checks, it passed the unit tests ...

This will provide further impetus for not using CCM/GCM and similar AEAD cipher modes in any context other than octet stream protocols (e.g., TLS, but _not_ DTLS).

There is ongoing work to produce AEAD cipher modes that do not fall apart when {key, nonce} pairs are reused (for different message plaintexts, naturally).

Crypto is tricky...

It's easy enough to construct AEAD cipher modes by generic composition that don't have this problem. For example, the Kerberos enctypes don't.

The key is to construct AEAD cipher modes that are efficient (Kerberos' add a block's worth (16 bytes) of "confounder" to each message, bloating it, and also adds a block's worth of cipher operation, slowing it down) and don't require additional random numbers if at all possible (those confounders in Kerberos are random numbers prepended to the plaintext). Also, Kerberos' enctypes use HMAC for the integrity protection part, which is not a very fast MAC.

One thing I don't understand about this attack is whether it allows the attacker to connect to the Wifi network (in the style of airmon-ng for WEP), or whether the attacker can only decrypt packets of legitimate users. Has anyone figured this out?

This attack doesn't expose Wi-Fi network key, so I assume it doesn't let you join the network, just sniff the traffic of the targeted user (and also, in some cases, forge/inject packets).

This attack is a MitM. The attacker is able to act as the access point and receive the victim's traffic, unencrypted.

He is also able to modify this traffic and serve compromised pages.

The attacker needs to have a stronger signal than the legitimate access point. So he has to stand pretty close to the victim, physically.

But still, this is only problematic if you "trust" an access point. If you treat any wifi network like a public hotspot (i.e. potentially rogue), there's not much of a difference, right?

In practice, as long as you only trust data received through TLS, you should be fine.

> But still, this is only problematic if you "trust" an access point.

Which is the case for the vast majority of wifi users.

It is completely irrelevant how any of us here consider their access point. The problem is that the masses could be subject to these attacks and allows propagating malwares and botnets.

Do you not 'trust' your home wifi router?

Well, I don't trust whole the network (internet) in general, as I don't know where and how things are routed, and by whom.

The only situation where trust in the router would be relevant to me is for communication within a local network, so I'm controlling every device involved. And there, this attack might actually be harmful, as far as I understand.

Personally I don't, because my ISP owns it.

Definitely another big issue, inherent to cryptography (and maybe matematics in general), is the subtelty of creating the appropriate definitions.

As the post says, they thought about the individual components, but failed (or maybe it is just too difficult) to define the security considering composability of schemes. This is a clear limitation of modern cryptography.

Great post!

Funny how 2013 discoveries are tried to be forgotten and not mentioned:


"$250m-a-year US program works covertly with tech companies to insert weaknesses into products"

"The documents show that the agency has already achieved another of the goals laid out in the budget request: to influence the international standards upon which encryption systems rely." [emphasis mine]

Now compare with the details from KRACK regarding Android and Linux:


"Our attack is especially catastrophic against version 2.4 and above of wpa_supplicant, a Wi-Fi client commonly used on Linux. Here, the client will install an all-zero encryption key instead of reinstalling the real key. This vulnerability appears to be caused by a remark in the Wi-Fi standard that suggests to clear the encryption key from memory once it has been installed for the first time." [emphasis mine]

Sometimes adding an innocent-appearing remark, and later acting on it would be all that's needed for a rich and mighty adversary.

I don't claim it happened this time, but I worry when the possibility isn't being considered and properly investigated.

Can someone clarify if wpa2-enterprise with radius auth is affected? I have seen assertions both ways.

It is affected. Note that this isn't an exploit that allows the attacker to discover the encryption key or connect to the AP by getting around an authentication scheme (such as radius).

The attacker impersonates an existing AP and forces the client to reuse IVs / reset the encryption key to zero (in the case of wpa_supplicant), and is able to decrypt traffic that way.

Yes, but stupid question. If Radius is being used, there's no way to successfully impersonate the AP and have the victim accidentally try to join it because the AP won't be able to authenticate the WPA2-Enterprise credentials? Then the attack fails? Or silly me, the fake AP could be configured to somehow accept the login attempt by the victim? But if so, then it'd be able to record the login credentials thus making the attack's impact even larger than I've seen discussed, because then not only is the attack successful in decrypting the victim's traffic, but the victim's Wi-Fi credentials are also stolen?

> If Radius is being used, there's no way to successfully impersonate the AP and have the victim accidentally try to join it because the AP won't be able to authenticate the WPA2-Enterprise credentials?

From what I understand the credentials won't matter. Anyone else more knowledgeable please correct me if I'm wrong, but the attack goes something like this:

1. Capture trafic that includes the 4-way handshake

2. replay message #3 of the handshake

3. client's encryption key is set to zero (in the case of wpa_supplicant), and nonce/IV are reused going forward

4. you are now in control of the encryption key being used (again, only wpa_supplicant) so you can go ahead and MITM the victim's DNS queries, capture cookies, etc...

The Details section of the researcher's site explains it pretty well:


Lastly, the most urgent task for mitigating this is to patch client devices as quickly as possible.

Android's fractured vendor-specific distributions and lack of long-term support for the low-mid level models is going to make this a difficult/impossible task...

Thanks, it's right there in the text, clear as day. :)

Note that our attacks do not recover the password of the Wi-Fi network. They also do not recover (any parts of) the fresh encryption key that is negotiated during the 4-way handshake.

This attack happens after the authentication stage and the client is trying to negotiate session keys with the AP. As a note, this is also why the pre-shared key isn't intercepted by this attack in WPA2-PSK.

The initial authorization/pre-shared keys are used to negotiate and share session keys. It is the negotiation of these session keys that is attacked, leading to insecure session keys being used. The initial auth parts of the handshake are not being abused here, only steps 3/4 of the 4-way handshake.

I believe in this auth situation, the password is not sent in to the server in cleartext, but rather the server sends a challenge to the client, the client performs the challenge using the password as a parameter, sends the result to the server, then the server checks the result is valid for the expected password. The password itself cannot be determined from the challenge result.

So, even by impersonating an AP you don't get the user's password.

The handshake attack works on all types of wpa2. But they can only attack a client and do mitm, not gain direct access to your network.

Re-read this article every time when you come across yet another blog post or article extolling the virtues of formall proofs, verifications, and, while we’re at it, static typing.

None of these work when taken in isolation and given to actual people to work with.

Yet all of these are about 100x better than the cowboy attitude 99% of developers these days have. So I'd say that every time you come across a blog post extolling the virtues of formal proofs, verifications, and static typing, you should listen.

Because you are awful at programming, and security, and these things will help you. Sure, they won't be enough, but to think they are anything other than minimum requirements for security is being incredibly arrogant and deluded.

I totally agree with you. There are just too many articles and blog posts that read like "formal proofs and/or static types are end of all need for anything like tests be cause while the program is verified and/or compiled, it's correct".

As programmers we are always somewhere in the middle :)

On mobile and imgur refuses to work on my Fx, so won't post a screenshot. So just imagine how funny yours and oconnore's [0] comments look next to each other...

So, which way is it?

[0] - https://news.ycombinator.com/item?id=15488391

If you look around, the problem touted here was that formal verification was done, but independently for two different parts of the standard, in isolation.

The layman answer that is formal verification should've been done on those systems after integration. I'm not sure how much that would've helped though, since the proofs for each of those was complex enough. And then actual code written tends to sway far away from what protocol implementers (also crypto folks) think it should look like.

So I'd say more formal verification, but also make it easier and accessible for normal developers without a Math PhD. And verification of actual code would be ideal.

Something I've come across recently is formal verification of cryptographic implementations not on paper (i.e. mathematically) but in plain C using tools and languages developed recently.

For example, a three part series on how SAW and Crytol were used to formally verify s2n (AWS's TLS lib):




PS(A) for the adventurous: I've also come across a github repo that lists related stuff


The protocol was formally verified. And yet...

Did you skip that last 4 paragraphs?

It's precisely because I didn't skip the last four paragraphs that I wrote my comment.

Here's how the fourth last paragraph starts:

> The critical problem is that while people looked closely at the two components — handshake and encryption protocol — in isolation, apparently nobody looked closely at the two components as they were connected together.

> In the end we all know that the answer is for humans to stop doing this work. We need machine-assisted verification of protocols, preferably tied to the actual source code that implements them. This would ensure that the protocol actually does what it says, and that implementers don’t further screw it up, thus invalidating the security proof.

And yet, there are still regular posts here about whether static type systems have any merit at all... never mind advanced things like linear/dependent types, theorem provers, TLA+, etc. Just use Go!

We did formally verify the 802.11i protocol, though: https://link.springer.com/article/10.1007%2Fs12204-009-0122-...

Question about this, because the featured article strangely doesn't actually say what the problem is: Did this verification result simply forget/not consider nonce reuse?

The article seems to imply there's nothing wrong with the proof or the statement of the properties to be proved, it's about integration with another component. But if "don't reuse nonces" is a standard security property to verify, then I would think that that is a hole in the verification.

Could someone shed more light on this?

It is covered in Q&A at https://www.krackattacks.com/#faq - basically the proof didn't cover this particular scenario:

> The 4-way handshake was mathematically proven as secure. How is your attack possible?

> The brief answer is that the formal proof does not assure a key is installed once. Instead, it only assures the negotiated key remains secret, and that handshake messages cannot be forged.

And the implementations ignored all the fundamental "nonces should never be used" part of the formal handshake protocol, and thus would have failed the aforementioned testing.

Question for the HN mods: I submitted this same article earlier: https://news.ycombinator.com/item?id=15483452 Why was this one able to be submitted instead of just redirecting to my submission?

I have the same question. I tried to submit this same article last evening and got redirected to Perceptes' submission. I'm curious as to what happened?

The quality of this title is terrible as well, the title should say 'Vulnerability in WiFi handshake discovered, its name is KRACK'

This seems so overhyped? Why do you care about encrypting the step from device to access point? If you have sensitive data, then you must encrypt end-to-end anyways, because there are a lot more steps between access point and destination. Thus, HTTPS or Signal or PGP or whatever is enough and WPA does not matter.

There are many services in your local network, which are un-encrypted. Any basic protocol that is in the lower layers required to run the infrastructure like ARP, DHCP, DNS, and alike. Then there are also higher layer protocols, which are un-encrypted like SMB, AFP, NFS, and alike. Many IoT devices, like your Cisco Phone talks TFTP at startup.

That is the reason why we have/had WEP, WPA, WPA2 to allow a local infrastructure to have the very same trust like physical cables.

That might be true if we had secure DNS deployed. But we don't, so using this attack, DNS cache poisoning is pretty easy. And if you can subvert DNS, you can redirect HTTPS traffic to a typosquatted version of a website. Unless you check the URL bar every time you visit an HTTPS website that is in your bookmarks, you'll probably not notice, and there goes your password in the clear across a nice secure end-to-end encypted HTTPS connection to the bad guy.

Modern browsers are very vociferous about not connecting to https sites with certificate errors. So while DNS cache poisoning with this attack is Very Bad, unless the attacker can insert a bad CA certificate in your browser, or has a valid but improperly-obtained certificate for the website in question, you're not going to fail to notice that something janky is going on.

And that is actually what the author of KRACK has presented. He re-directed as MTIM via DNS to a different site.

Yeah, but it's about the IoTs. A great many IoTs have no real security beyond whatever WiFi can provide, and.. you can imagine why: provisioning them with credentials that local systems will be able to validate and trust is.. non-trivial.

Worse: IoTs generally have no firmware/software update cycles, so vulnerabilities like this one are forever.

My plan is to build tiny one-device wifi/wired bridges for all my IoT devices that have wired ethernet. For the other IoT devices the plan is to update or replace them. What's yours?

apparently this comment has been downvoted. people with very well formed opinions have been saying that end-to-end is the only way, or at least the very best way.

it is true, as the responders point out, that some of the control protocols used in the internet blindly trust what they receive. fixing those protocols and ensuring reasonable end to end authentication is a much better use of time than fussing around about a single link level solution.

imagine being able to forget about half assed measures like policy based firewalls and stateful nat for security.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact