Hacker News new | past | comments | ask | show | jobs | submit login
Cognitive Overhead, Or Why Your Product Isn’t As Simple As You Think (techcrunch.com)
240 points by pg on April 21, 2013 | hide | past | favorite | 73 comments

I was all primed to disagree with this article, but it's a pretty well placed piece.

I think an interesting pivot occurs when your start hitting the limits of working memory. Posting a photo is usually pretty immediate and doesn't max out your working memory.

Many years ago, I worked on a project to convert a Loan Application system from a VT100 "green screen" terminal to a Web Interface.

The Web Interface was actually pretty good, but it required too much navigation and too much working memory. We tried to reduce the number of clicks, but then the pages themselves became too slow and confusing.

Finally I got to see someone using the old VT100 app. It was blistering fast, and they supplemented it with codes written on a piece of paper. Crude and it required training, but it was far superior. The main thing was navigation. The codes on the paper were enough of a picture to offload that bit of cognitive processing... With the web interface, as soon as you clicked and had to pause or search - you'd lose the thread.

Unfortunately too late for that project, but a good lesson for me.

Knowing what you now know, what tricks would you use to make the web interface more like the VT100?

My guess would be keyboard navigation, and entering numbers as codes instead of clicking radio buttons.

It's important to remember that discoverability in an interface doesn't matter as much when you can train the workers. Not everything has to be intuitive, especially when you control training (and you can trade eventual speed for immediate understanding).

This is the basis for all the VI/Emacs users who claim to be more productive than in a visual editor/IDE.

I did come up with a prototype, but it went a bit of a different direction.

My theory was the biggest issue was the UI enforced a workflow. Most likely the banker is given a big pile of documents, then they work through it. One confirms your address, the next your income, the other is some other piece of evidence.

I particularly noticed they would be sitting there shuffling back and forth between the paper & then shuffling back and forth on the interface. It caused a lot of rework, and required you to remember where you were at with both.

The Web Interface was waaay to rigid. Want to add a party - you need to enter their name and work details. So you'd have to track down that information. Even if it wasn't 100% required to get a provisional. You could enter whatever value you wanted for the income at this stage, so why force the banker to enter work address and work phone? They can do that at final docs when you're required to provide a payslip (for example!).

Paper & terminal worked because you could move quickly between disconnected parts. Enter the person's name and DOB. Then jump to income. Back to their home address. Over to some kind of KYC information, etc, etc.

So I came up with something that was a bit more of a mind-map. Each node indicated if it (and it's subnodes) were complete or not. You could jump around and enter data in whichever order you liked & not have to think too much about what we left to do. Then the nav was about quickly moving up & down the tree (keyboard being a big factor).

It focused on making the data capture piece much more organic & goal-based. You only needed to enter discrete information to get a provisional, so use flags to tell the user what's needed. Then they can populate the space in any order they like.

Was waaaaaaaaay to weird for a conservative bank to consider :-) So I never really got to try it out. Would have been interesting to see if it made any difference.

Having done similar things, I agree.

1) Make sure that the UI fits with the order they have data on the bits of paper. 2) Let them data in any order they like, and give them a big "Validate" button they can press to check whether what they've entered is all coherent. (Don't check as they go along, or the validation markers will confuse and distract them.)

...or more generally the CLI vs GUI argument

I like your analogy of "working memory".

Oh..whoops. I suppose "working memory (machines)" is an analogy to "working memory (humans)", so your use of "working memory" actually directly refers to humans, and isn't an analogy at all :)

> There isn’t yet much written about cognitive overhead in our field

Couldn't disagree more. Cognitive load is a major element of information architecture and even has an explicit representation in flow charts. Any book on information architecture or UX is half about it.

Many if not most decisions in interface design and IA are based on estimating (and reducing) cognitive load - how many options/buttons to present, how deep navigation should go, how to group things, splitting up an action flow in order to reduce the memory/choices needed...

The example given for QR codes is also wrong - "So it’s a barcode? No? ..." - it is a barcode and you need a barcode scanner/app, everyone knows how barcodes work. QR codes are not the best thing around, but IMO they didn't catch on for a lack of interest from manufacturers (no native support), not because it's overly complex.

The argument about QR codes I didn't totally agree with either, and I think that at least in Europe it caught-on pretty well.

One thing that could definitely be improved is if the built-in camera app on your phone automatically detected them and popped-up more info. When I first tried using a QR code, that's what I did, using the iphone camera app, and was disappointed to discover it didn't work this way.

Eliminating the need for an extra app and could dramatically boost their appeal, but the core concept is there.

QR codes never became popular here in Turkey. I'm pretty tech savvy guy but I asked all the questions when I first tried a QR code. There was an android appstore that I can't find a download link, just a QR code, I didn't know how to download them and gave up.

When I figured out that it holds information that must be interpreted by an app, I was amazed and fooled around to see how amazingly accurate it is at reading it.

But still, I see why it never caught-on here. We are a nation used to transfer pirated audio/video files, crack games and just don't expect things to work like "take a photo and a website opens". There is a poster about a concert with a QR code? Just google the artists name with the city name and the first result is the website selling the tickets. Why bother to find out what is that funny barcode, download an app for that? Just google it, you will find what's going on...

My windows phone has QR reading capabilities built in (Bing Vision).

Bing Vision can also apparently scan text you point the phone at, translate it to a different language, and the overlay the translation on the screen.

Inspired by Google Goggles, which for some reason was never merged into Android's camera app.

QR codes were the result of people who forgot about/never knew about the CueCat.


TL;DR: They solve a problem that people don't have.

Depends on where you live; they're everywhere here in Japan (where they were invented).

This is probably due to the evolution of Japanese phones, which have had a crude mobile internet and built-in cameras from about 2000 onwards. Given the limitations of the devices and networks, QR codes were the easiest way to move data between phones (because there was no way in hell NTT would allow you to run apps on one).

These then got used for coupons, and with the shrinking economy and people wanting to save money, people quickly internalized how QR codes worked. Likewise, phones come with QR capability as a builtin -- it's not a separate app.

I imagine you might see the same sort of use case in developing nations where computers and smartphones are too expensive, but that window is rapidly shrinking due to the low cost of Android devices.

Only if you think of them as just encoded URLs; they do solve a myriad of real problems. "Just type the URL" doesn't hold when you want to add query parameters to it, use a different protocol or store data by itself. It's a barcode with more density, so it serves the same purpose as 2D barcodes; it is widely deployed in industry inventory control systems, shipping, access control, event ticketing, shopping.

    Every man takes the limits of his own field of vision
    for the limits of the world. -- Arthur Schopenhauer
I am definitely guilty of this. It's why I try to walk around the intellectual block a bit, see the sights, then come back and realise I know nothing. Absolutely nothing.

Heck, I even wrote a disclaimer:


Well - it depends what he defines as "our field".

The author David Leib has an engineering and MBA background. This stuff is "duh" levels of old hat to folk with any kind of cog psych background. Anybody who is vaguely competent and IA/IxD should understand the term and have it in their toolbox.

This stuff is, unfortunately, rarely talked about and even more rarely taught to engineers and business folk.

Cognitive load comes from "Cognitive Load Theory" (by Sweller), which is a specific field in Psychology.

Psychologist told me, "Multimedia Learning" (by Mayer) is taken more seriously by psychologist than "Cognitive Load Theory".

The problem with QR codes is that they are 'mystery meat'.

I think there are some strong points in the article but overall it misses a major point. Whatever interface you are designing, the underlying service you are building the interface for has to be something people really, really want.

The most cognitively recognizable interface won't get people using something if they don't care about the information coming from the other side. Conversely, if there is information that the end user needs, he will spend the time learning the interface. For example, if Shazam didn't exist and you heard a song you really liked at the bar, you might ask around, google some fragment of the lyrics you recognize, etc. to find out the same information. So if Shazam had an interface that was far less-intuitive, I have to believe it would still be very popular.

That sounds simple, but I think the article puts the weight on the interface without much regard to the fact that all interfaces are merely a means to an end.

That's an excellent point, which I will take a step further: It has to provide something that people truly want, and the acceptable cognitive load is a function of the pleasure derived from having that desire fulfilled.

The pleasure payoff must be higher than the pain inflicted (so to speak). Just because people want something, it doesn't mean they are truly willing to put in the required effort to get it. As the average pain tolerance of the mass market is low, the payoff must be very high (communication with loved ones, financial well-being, major time savings) for a high pain point (cognitive load in this case) to be endured.

It is helpful to identify if there are multiple types of users, each with different goals and pain points. If that is the case, it is optimal to only expose users to the cognitive load necessary for the tasks they are interested in. For example, there could be anonymous browsers, participants, authors, and moderators. In that case, the cognitive load for casual browsers should be as low as possible, since they have the least benefit and buy-in. As with graceful degradation for less capable web browser software, apps should degrade gracefully for users with less capability or interest in using the additional features which require a higher cognitive load.

With regards to Shazam, I disagree that it would have been as successful if the interface was more complicated. Sure people would still have used it, but if it was harder to figure out or required more steps or had a 15 minute delay before it emailed you the answer, usage would have been far lower.

Sorry if this all sounds like a cognitive load of shit. Hopefully I have communicated it clearly, but I suspect I may have exceeded the pain threshold a while ago.

> Whatever interface you are designing, the underlying service you are building the interface for has to be something people really, really want.

How do you target that? I'm not entirely sure you can. Is there a set of steps you can follow, a design process, that will get you to something people really, really want?

That's generally what a lean startup process / mindset is supposed to help you find. The idea being it's not something you can get a read on without putting a prototype MVP into the world to test.

Search for "Customer Development" for a lot of insight.

I'd be interested to know when/where the term "cognitive load" first appeared, if anyone knows.

I saw it mentioned in some very old literature when studying for my pilot's license. (Sometimes as "mental load") so I'd say its been kicked around in aviation since at least the mid 60's.

I don't have an academic library to hand - but I'd guess you would have seen the phrase in the sixties in academia, but it probably didn't have the same meaning as it does today.

The roots of the idea of cognitive load - rather than the phrase itself - can be traced back to Miller's seminal (and much misinterpreted) "The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information" (http://www.psych.utoronto.ca/users/peterson/psy430s2001/Mill...) although he didn't use the term "cognitive load" itself.

The term "cognitive load" was certainly used in the 60's and 70's - I remember reading it in papers - but IIRC it didn't really have any formal definition or model at that point.

Modern uses of cognitive load track back to John Sweller's development of Cognitive Load Theory in the 80's. It was his work that really popularised it and pushed it out as a well recognised term in things like instructional design and the various disciplines that have morphed into what we now call UX.

The term comes from psychology, so probably a few decades back: http://en.wikipedia.org/wiki/Cognitive_load

In relation to the web, a quick Google Scholar search brings up lots of papers from the 1990s: http://scholar.google.com/scholar?q=%22cognitive+overhead%22...

It's far older than the 90s, but that's a product largely of what old papers have been digitized, and minorly on the terminology—I think you'll find some papers that say 'cognitive workload.'

I recently fixed up an old bit of scientific apparatus, which was used to measure the cognitive workload of various aircraft cockpit designs. It used a method called visual occlusion, which you can guess the rest of—the pilots would sit in trainers with a set of glasses, that had spring-loaded solenoid flaps to block the pilots' vision selectively. The open time as well as the frequency of the system was experimentally varied until the pilot could fly the plane safely, for given conditions. The vacuum tube circuit that I repaired had a date stamp from the mid seventies but this technique was already more than a decade old by then. The same technique is now used (with liquid crystal shutters) to evaluate interfaces in cars for drivers, maybe a variant would be useful for certain application areas of software (modulating the screen, perhaps)?

I poked around myself and the earliest use of the term I can find is in an article from 1968:

  Pupil size and problem solving
  JL Bradshaw 
  The Quarterly Journal of Experimental Psychology, 1968
Of course when I tried to read the text of the article, I got a message saying

  Sorry, you do not have access to this article.

  How to gain access:

  Recommend to your librarian that your institution 
  purchase access to this publication.

Here's one from 1970


If you search that book for "cognitive load is defined" you get the snippet:

> "Cognitive load" is defined loosely as the amount of mental strain put on a person during the performance of some task, often at least partially due to the constraints placed on the performance of that task...

The related term "mental load" seems to be a few years older, coming mostly from precursors to HCI, such as industrial ergonomics. Here's a use from 1961, also paywalled, but with the abstract available: http://www.tandfonline.com/doi/abs/10.1080/00140136108930502

It's also used in this 1961 paper, which looks at spare mental capacity of car drivers compared to the mental load imposed by various driving situations: http://www.tandfonline.com/doi/abs/10.1080/00140136108930505

Here's the crude Google ngrams view of some of the variant terms: http://books.google.com/ngrams/graph?content=cognitive+load%... The "mental load" data is polluted by OCR errors like "supple- mental load" though, especially earlier on.

"Pupil size changes were monitored during the solution of various types of problems. A number of solution and response strategies were required of the subject. There was strong confirmation of the theory that this autonomic index can provide a sensitive measure of the fluctuating levels of attention and arousal, which are associated with the various aspects of information processing and response."

PDF link (bad quality): http://www.filedropper.com/article

It continues to be absolutely appalling what a large majority of academic work is locked away from the public.

Is that the equivalent of a "brain dump"? ;)

I make better photos with my old manual Hasselblad from the 1970ies. Because the interface has massive cognitive overhead. No automatic at all, all manual, clunky, complicated. It forces me to think slowly. Therefore it gives my head and heart the time and deep concentration about the picture I want to make. This leads to much better photos.

I'll play devil's advocate to that. Nowadays you can use high resolution cameras with automatic everything and take lots of pictures. High resolution means cropping still yields a useful picture and taking lots means you are far more likely to have a useful picture. You are certainly far less likely to miss something. (Manual focussing is also not available to short sighted people like me since glasses/contacts don't result in perfect correction.)

If we extrapolated your approach to development then we shouldn't allow highlighting editors, debuggers (Linus has argued this), and similar modern tools. Heck you should have to wait hours/days for program output like they did in the punch card days.

I suspect that skilled people can make good use of the tools available, be they completely manual or with lots of automation. It is quite possible the automation doesn't help them that much. But the vast majority of people are closer to average.

A interface that makes things easy is comfortable and probably profitable, since most people are lazy. But convenience does not correlate with quality.

It certainly depends on the results you're after. If you're photographing landscapes you're not worried about "missing something", and taking lots of pictures without manual adjustments won't do any good.

I thought the diopter adjustment could compensate for glasses?

> I thought the diopter adjustment could compensate for glasses?

The problem I had when I tried it was that I could get everything looking perfect through the viewfinder, but the resulting picture was blurred. My current prescriptions round to the nearest 0.25 (glasses) / 0.5 (contacts) dioptres.

Even if I could make a perfect adjustment, my vision alters during the day. For example when tired things can get a little blurry.

I have 3.2 dioptres and have no problem at all with the hasselblad manual focus.

I'm at -5 and -6 and the last time I tried things was with a Nikon many years ago.

Could it not be the other way around? Modern camera's surely are far more complicated than that Hasselblad, and offer a magnitude more options to fine-tune before taking a picture. To my knowledge though, they don't have fewer abilities for manual control. It is fair to say that the cognitive overhead of understanding what a modern camera will do when you take the picture is much greater than that of the old camera.

Perhaps you can take great pictures with that Hasselblad because you understand exactly what it does and how it does it?

If you know what you want, you have to guess at how to get the automatic device to give you that result, and you have to look at the result to see if the device correctly inferred what you want. That feedback loop where you reverse-engineer someone else's guess at your workflow produces more cognitive load than a dozen knobs with deterministic and predictable effects. Being able to eliminate the feedback loop is a huge win, and is why keyboard-based interfaces can still be more productive than GUIs where you have to look to make sure the button is where you expect it and the cursor is where you want it.

I found the same when I started using prime lenses. Not only was the quality there, but it forced me to think about composition, where I was standing and the geometry of the shot.

The interface of a prime is much simpler though as a whole dimension of control is removed.

I am not sure if every product benefits from reducing Cognitive Overhead. Some of the areas where it would work are novel technologies (like Shazam or Wii) or mass market products.

How many of us would use a music player that just had start and stop buttons? Why would this seem as an inconvenience? If the product is already familiar, then it would be better to improve on the already familiar interface. When it is a completely new technology, you get an opportunity to make a fresh start.

Similarly, many niche products have power users who would be unhappy if the interface is too "dumbed down". The classic case of an IDE and UNIX text editors come to mind. An IDE is quite obvious to use, but many would not find it as efficient.

Cognitive overhead is a good guideline but we need to understand our target market first.

What I get from the article, the author doesn't actually suggest less buttons or less control over the product - quite the opposite actually.

Also, music players are quite a bad example of something to disrupt - the iPod has pretty much shown 12 years ago what's the easiest possible interface that still does what 99% of users need. I think he's rather talking about cloud services that automate some of of your task where he proposes to actually add more steps / buttons / messages in the middle that help the user understand what's going on. To quote the article:

> Put your user in the middle of your flow. Make them press an extra button, make them provide some inputs, let them be part of the service-providing, rather than a bystander to it. If they are part of the flow, they have a better vantage point to see what’s going on. Automation is great, but it’s a layer of cognitive complexity that should be used carefully. (Bump puts the user in the middle of the flow quite physically. While there were other ways to build a scalable solution without the physical bump, it’s very effective for helping people internalize exactly what’s going on.)

How many of us would use a music player that just had start and stop buttons? Why would this seem as an inconvenience? If the product is already familiar, then it would be better to improve on the already familiar interface. When it is a completely new technology, you get an opportunity to make a fresh start. Similarly, many niche products have power users who would be unhappy if the interface is too "dumbed down". The classic case of an IDE and UNIX text editors come to mind. An IDE is quite obvious to use, but many would not find it as efficient.

Erm... This is the point of the article to some extent ;)

I think you're misunderstanding cognitive load/overhead. It's not about dumbing down. It's not about reducing the amount of information or interface. It's not about "less stuff".

Dealing with the human limits on what people can process often involves doing exactly the opposite. You add more information or use familiar interfaces so that folk don't have to remember/learn new things.

See http://edutechwiki.unige.ch/en/Cognitive_load for some info.

>How many of us would use a music player that just had start and stop buttons?

Two words: iPod Shuffle. Or, if you want a slightly more complicated example, Pandora. Pandora literally started out with three buttons: "I like it", "I don't like it" and "Stop". Recently, they added a fourth, "Skip".

The 3rd generation shuffle has no controls apart from a switch between loop, shuffle, and off.


No: if you double-click, triple-click, and (potentially in combination) hold that button you can do all of the normal things you can do with a media player. Go read the user manual. People who believe otherwise (and I've seen many) simply assume user interfaces don't exist if they aren't obvious.

Define "recently". I can't remember not having a skip button and I've been using pandora for a couple years at least.

I don't think this conflicts with the author's post at all.

The post is quite specific about targeting the mass market.

Much as I like having options, I have come to realize this applies to code as well. On my current project, I keep telling my partner that we need to keep the cognitive load down, because eventually we will have to bring on other people. Being that it's written in C++ (please don't flame me; it's what we have to work with), I'd love to use things like template metaprogamming and multiple inheritance. Since we are already working in a domain that doesn't always follow common sense (RF/E&M), and we are also using some design patterns that take a little getting used to in C++, we've had to eschew a few features of the C++ language. It's probably for the best anyways.

Cognitive overhead doesn't just apply to products we'd think of as simple.

Vim is an example. One of the key benefits of Vim's editing style is that it reduces (one kind of) cognitive overhead. It really does.

You want to delete the current sentence? Just type 'das'. Want to change the text within the quotes? Just type 'ci"'. It allows you to more directly express your editing operations, rather than having to translate them into a number of steps. That reduces the cognitive overhead of that action.

It's also a case where you need a certain amount of training before you can perform the actions that reduce the cognitive overhead. Nothing says that the reduction in cognitive overhead has to be there from the get-go.

I'm not saying that Vim's operation reduces cognitive overhead in all ways, but that it does in one specific respect.

Off topic: what's up with the url ;)

You're right, what extra information is this URL sending? I haven't seen it any other techcrunch linked article.


Looks like an email campaign tracking url

Great article, and totally agree.

I am working on a product where that is a critical design principle. An interesting related question I am wrestling with: can a product be successful by reducing overall cognitive load for a user, but carrying a larger cognitive load than the average app? That is, it reduces complexity for users, but since it deals with higher complexity concepts, it still nets out to being more complex. If so, how much additional cognitive load does this make tolerable/enjoyable? My guess is a little, but not much (in the mass market, of course).

My problem with startups is the cognitive overhead in linking their company name to their function. I cannot remember what trello does for example? Naming of services should be more obvious.

Naming things is so hard. Descriptive names might be great at telling users immediately what your service does. But there are only so many plays on "To Do List" that work for product names.

As soon as you start actually using a product, it's really nice if it has a simple, pronouncible name. It's much easier to say: "Did you see this on Trello?" to a coworker than "Did you see this on FogCreek To Do List Pro?"

Another simple product with a great name is Doodle. Of course you could have a more obvious name, like "Cooperative Schedule Maker 3.0". But that just makes it so much harder to talk about!

Trello is a good name.

I made http://www.nomipede.com to solve this exact problem.

Domain name squatting makes that much difficult.

If your product is a mobile app primarily, domain names are far less important. (We don't own bump dot com, and while it's hard to prove the absence of a thing, we don't think it hurts us at all).

It certainly does. I've had occasions where I spent most of a day fossicking for a halfway decent domain name.

This is hilarious: "Web 2.0 took over, yielding big buttons, less text, more images, and happier users."

See also: http://www.thebaffler.com/past/the_meme_hustler

Products that have low cognitive overhead:

- Original iPod

- Fender Stratocaster

- Roland TB-303

- Casio F91W

- Unix (Ok, it's not a simple product, but "everything is a file" goes a long way.)

The "test on the young, old, and drunk" concept got me thinking: this might be why the winter travelers among us are so much better at design.

I always thought my design skill and difficulties were separate issues, perhaps connected at some neurological level, but I think the wide variations of intellectual ability (+/- 15 IQ swings that normal people don't experience) are why we develop such creativity and design sense. I have a week or two every year when I'm basically incapable of doing most of the stuff that's normally easy, but during that time I learn how to design for that person. How do you get the person whose cognitive resources are strained (fatigue, not stupidity) to see the value right away, rather than fear and uncertainty and chaos?

We have to be careful, though. It's not that the rest of the world is stupid. The cognitive-load problem isn't about stupidity. It's about the fact that there are a million things competing for peoples' attentions and unless we can prove right away that our wares are cool (by demonstrating value in a simple, low-cognitive-load way) we fade into the noise.

What is a "winter traveler"? On searching for the term, I only saw websites about travelling during the winter. Is it a sufferer of seasonal affective disorder?

Person with mild mental illness that results in clearer perception, e.g. depressive realism, and often better character. (When your biology is difficult, you don't get to fuck around the way most people do.)

Traveling through the forest during summer is much more comfortable, but you can't get as sharp a feel of the landscape because the trees are leaved out.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact