Monday, March 1, 2010

What is the natural user interface? (book excerpt)

[Update 4/6/2010: Based upon feedback and careful thought, I have slightly modified my preferred definition. See my new post.]

Recently, there has been some discussion on establishing a definition for the term "natural user interface". My friend Richard Monson-Haefel (who just signed with O'Reilly on an iPad SDK book, congrats!) went through several iterations of a definition on his blog and ended up with this:
"A Natural User Interface is a human-computer interface that models interactions between people and the natural environment."

Wikipedia also has a paragraph describing natural user interfaces as invisible interfaces and lacking keyboard and mouse, but did not have a real concise definition. Ron George was a major contributor to the NUI wikipedia article. The first sentence says, in part, that a natural user interface is:

"...a user interface that is effectively invisible, or becomes invisible with successive learned interactions, to its users." also has a wiki page with a lot of description language on the natural user interface, but no concise definition.

As you may have heard, I'm writing a book Multitouch on Windows. A key part of my approach is teaching the readers not just the APIs but also the new ways of thinking required for creating natural user interfaces. My first chapter is titled "The natural user interface revolution" (appropriate since it was also the title of my first blog post) and so right up front, I had to tackle the problem of defining natural user interface for my readers in a concise and comprehensive way.

I took into account both Richard's and the Wikipedia article's approaches, but I was not satisfied with what they had. I think Richard is on the right track, but the way he phrases it seems limiting. Whether or not he intended it this way, modeling the interactions between people and between people and the natural environment implies rather literal interface metaphors with NUI interactions that simulate real-world interactions, but there is no reason why this should be so. The Wikipedia's description talks about invisible interfaces, but to a lay-person this does not make sense and requires additional explanation of what an invisible interface means.

Now, I don't necessarily disagree with how Richard and the Wikipedia article are describing NUI. NUI does have something to do with how people interact with the environment, and NUI interfaces do seem to be invisible, but why are these descriptions true? To help figure this out, I turned to Bill Buxton's presentation in January where he talked about natural user interfaces. I took detailed notes and one particular thing that he said really resonated with me:
An interface is natural if it "exploits skills that we have acquired through a lifetime of living in the world."

I used that definition to write a section in chapter 1 on what "natural" means, and then derived my own definition. Below is an excerpt from chapter 1 of my book where I present my definition for natural user interface.

There are several different ways to define the natural user interface. The easiest way to understand the natural user interface is to compare it to other type of interfaces such as the graphical user interface (GUI) and the command line interface (CLI). In order to do that, let's reveal the definition of NUI that I like to use.
A natural user interface is a user interface designed to use natural human behaviors for interacting directly with content.
There are three important things that this definition tells us about natural user interfaces.

NUIs are designed

First, this definition tells us that natural user interfaces are designed, which means they require forethought and specific planning efforts in advance. Special care is required to make sure NUI interactions are appropriate for the user, the content, and the context. Nothing about NUIs should be thrown together or assembled haphazardly. We should acknowledge the role that designers have to play in creating NUI style interactions and make sure that the design process is given just as much priority as development.

NUIs use natural human behaviors

Second, the phrase "designed to use natural human behaviors" tells us that the primary way humans interact with NUI is through our natural behaviors such as touching, gesturing, and talking, as well as other behaviors we have practiced for years and are innately skilled at. This is in contrast to GUI, which is described as using windows, menus, and icons for output and pointing device such as a mouse for input, or the CLI, which is described as having text output and text input using a keyboard.

At first glance, the primary difference between these definitions is the input modality -- keyboard verses mouse verses touch. There is another subtle yet important difference: CLI and GUI are defined explicitly in terms of the input device, while NUI is defined in terms of the interaction style. Any type of interface technology can be used with NUI as long as the style of interaction focuses on natural human behaviors.

NUIs have direct interaction with content

Finally, think again about GUI, which by definition uses windows, menus, and icons as the primary interface elements. In contrast, the phrase "interacting directly with content" tells us that the focus of the interactions is on the content and directly interacting with it. This doesn't mean that the interface cannot have controls such as buttons or checkboxes when necessary. It only means that the controls should be secondary to the content, and direct manipulation of the content should be the primary interaction method.
Excepted from Multitouch on Windows by Joshua Blake
Chapter 1, "The natural user interface revolution"

I think this definition is very powerful. It gets right to the core of what makes natural user interfaces so natural in a way that does not restrict the definition to particular input technology or interaction pattern. It also can support the points-of-view presented by Richard and on Wikipedia, but in a more general way. 

By talking about directly interacting with content, we establish that content interaction should be primary and artificial interface elements should be secondary and used only when necessary. This is an easier way to say the interface is invisible. 

By framing the definition around natural human behaviors, we can talk about reusable patterns of behavior derived from human-human and human-environment interaction without implying we should model the interface after specific interactions. We can apply natural behaviors by reusing existing skills, which is what Bill Buxton was talking about. In the chapter, I spend a lot of time discussing these skills and how to apply them.

If you would like to read more on this, the entire chapter 1 is available for free download from Manning, where you can also pre-order the MEAP and read chapters as I write them.


  1. I think your definition is pretty good, and I think it was smart to let Buxton be your guide. He always comes across as very practical to me. He's not as interested in saying we are in the middle of a revolution that will render the mouse obselete or will lead to interaction utopia. He merely talks about how the tools are getting better, that it's a long time coming, and we need to be actively adapting and improving our designs to take advantage of these tools, otherwise nothing will actually get better.

  2. Thanks Ben.

    And to be clear, I don't think the mouse and keyboard or GUI are going away. (Actually this is exactly the next subsection in chapter 1 after the excerpt above.)

    I also use the term "revolution" in the context of how we think about interfaces, rather than how they will be adopted. We have to throw out a lot of conceptual baggage from GUI (though some things like drag-drop are good direct interaction patterns) and think about NUI in a fresh light. The actual adoption and deployment of NUI input technologies will be more of an evolution.

  3. Just re-read my comment and feel like it could come across that I was comparing Buxton's practicality to your own, but I meant to compare it to the proverbial "they". I like your first chapter so far, and I think it's a promising start. It's not inherently off-putting to talk about revolutionizing HCI, as long as it's in equal measure with reality and common sense. :) Looking forward to seeing more.

  4. Chapter 1 is a very good read and I'm sure the rest of the book will be just as good. I like your definition of NUI but I've asked in my own blog post that you modify it slightly - or at least consider modifying it.

  5. Hi Josh,

    Congratulations for your scholarly paper and this first chapter. Very, very good job !
    Concerning your definition of NUI, I feel like you're on a good way.
    Anyway, I think Richard modification is quite interesting since "both innate and learned skills" seems to me more accurate than "natural human behaviors" and allows to define what means "natural" for our work.
    On my side, I think you should not use "directly" in this definition. Of course, direct interactions are central for NUI and should be considered with high priority, but (according to me) it's a consequence and not a component of this definition. As you wrote in your paper, direct interaction with objects is a skill learned very early during our life.
    As you wrote in first chapter of book : "direct manipulation of the content SHOULD be the primary interaction method" (which means it won't always be the case. Right ?).
    Some interactions in real life can be considered as natural but are not direct and I guess it will be the same with NUI applications when they'll become more and more complex.
    As definitions are quite binary things, I think "direct interaction" should not appear in this one but should be explicited (outside definition) as a direct and important consequence of "both innate and learned skills".

    Anyway... keep up the good job !

    My 2 cents

  6. Thanks for the comments guys. Very useful thoughts.

    Laurent that is a very interesting point about direct being a consequence of the definition.

    Here is part of an email I sent to Richard about whether to say natural human behaviors or innate and learned skills:

    "I like saying "behavior" for two reasons. First, it is a simpler vocabulary word. Just tested on my wife, she knows the definition of behavior immediately (the way someone/something acts in a situation), but has to think about skill before giving a partially correct answer, and could not tell me what innate means. She is my one-stop usability lab. ;)

    "The second reason is that I can then talk human-computer interaction as being the combination of human behaviors and interface behaviors. This gives me a rhetorical symmetry but allows me to explain how the behavior of each side works in different terms. The human behaviors consist of the innate abilities and learned skills discussed in chapter one. The interface behaviors are complex and subtle and understood by the user through metaphors, which I talk about in chapter two along with OCGM and how to build specific content- and context-specific metaphors on top of OCGM."

    Thinking about it more, I suppose that it could be interchangeable; the natural behaviors could be a consequence of reusing skills, or reusing skills could be a consequence of using natural behaviors. But the skills modification removes natural from the definition and I'm still leaning towards the simpler language of "behavior".

    On the question of direct -- in section 1.3.4, I elaborate by explaining interactions can be direct in three ways: spatial proximity, temporal proximity, and parallel action. This gives the definition flexibility, but I think that specifying direct in the definition is important, particularly if we are trying to contrast GUI and CLI styles of interaction.

    It's true that you can derive direct from natural behavior/innate and learn skills, but out of all the various attributes of NUI you can derive, I think direct interaction is one of the most important and should be called out. When explaining NUI to someone who doesn't know, the one-two combo of natural behavior and direct interaction explains about 80% of what we need to know without further explanation required, in my opinion.

    Perhaps "unmediated" would work instead of "direct", but I'm not sure we gain anything by making that change. I am open to changes, but I'm still leaning towards what I have.

    What do you guys think about those thoughts?

    Thanks again for the comments though, I am going to update the way I explain parts of the definition based upon your input, Richard and Laurent.

  7. I'm trying to think of a NUI interaction that is not direct. Here are some NUI technologies all of which, I think, are direct.

    Multi-touch (duh!)
    Speech (yes)
    Tangible User Interfaces (duh)
    Organic User Interfaces (more about output than input but still direct)
    Gestural/Spacial Interfaces
    Augmented Reality (this is indirect but only in the sense of the output).
    Automatic Identification (direct with facial, perhaps indirect with RFID)

  8. I think all of the technologies can be direct in one way or another. We should keep in mind that direct is not a binary measurement; it is a continuum. Mouse input is more direct (temporal proximity and parallel action) than command line (temporal proximity, maybe). Touch input is more direct than mouse (temporal proximity, spatial proximity, and parallel action).

    Within a single technology, there are differing degrees of directness depending upon how the interface is designed. For example, you can have a ScatterView type of interaction with is highly direct in all three categories. Less direct spatially would be a slider controller that changes content somewhere else in the interface.

    If you need to interact with a 3D object, you could say that TUI, OUI, or even some forms of AR allow more directness than multitouch.

    In some types of tasks you may want to include a little bit of indirectness spatially so you don't have finger occlusion.

    You may note that gestures are less direct than manipulations, but there is still a value of directness. System gestures, that might be recognized anywhere on a touchscreen, are less direct than what I'll call attached gestures, which result in action based upon the directness between the gesture and an interface element.

    I think maximizing directness is an important factor in NUIs, though it does need to be balanced with usability in some cases like those above.

  9. @Josh :
    I understand your point of view about your definition and I agree this definition is quite easier to understand, especially for readers discovering NUI.
    My comments were more related to an "academic" definition which should contain a minimal set of conditions. But "academic" definitions are not always the easiest to understand...

    Concerning my comment about the fact that "direct" is a consequence of this definition, here is the logical chain of my "thought".

    Two things are important :
    - NUI are designed to exploit both innate and learned skills to interact with content.
    - As told by B.Buxton : "We should exploit skills that we have acquired through a lifetime of living in the world".

    My assumption is that the older we are, the more "specialized" are new skills we learn.
    => the most common and shared skills are those we've learned early in our life.
    => if we want to build an application for "everybody" (with low cognitive load) we should maximize usage of skills learned early in our life.
    As wrote by Josh and Ron in their paper, direct interaction with objects is one of the first skills we learn as babies.
    I think this is why directness is central for NUI and why we should maximize its use. And this is why I said directness is a consequence of definition.

    @Josh, @Richard :
    Concerning my comment about indirect interactions in NUI apps or in real life, here is the example i was thinking about :
    When I want to switch on the light, I don't interact directly with bulb but I interact directly with the light switch, even if what I really want to do is acting on the bulb.
    I think this is why we consider "clap" system as a good example of NUI and (more) direct interaction.
    The strange thing is that when I interact (as a old guy) with the light switch I feel that I realize a direct interaction.
    But if you observe a young child who just learned how to play with a light switch, you'll notice that he doesn't interact with the bulb. He just interacts with a "magic" night/day switch.
    Later, he will infer that there's a relation between bulb and switch and that acting on the switch means acting on the bulb.

    I think this is why Objects and Containers are interesting concepts in OCGM :
    - Step 1 : child interacts directly with an object (the light switch).
    - Step 2 : child understands that bulb and light switch are components (objects) of a whole (container) which means they're related, "linked".

    We always interact directly with the switch but as we've "learned" that bulb and switch are components of a container (are related), we feel that we interact directly with the whole system.
    As wrote by Josh, may be we should consider there's different degrees of direct interactions :
    - level 1 : direct interaction with an object.
    - level 2 : direct interaction with a container (which means that we've learned what are components of container and their relations)

    Well, this is just the flow of my thoughts in the discussion. :)

    @Richard :
    In my humble opinion, multitouch can be indirect. For example, i think to "10/GUI - Continuum" project by Maxence Dislaire (
    Some labs are also working on indirect multitouch (
    My preference for multitouch is NUI but I really think there is some others possibilities, especially if you target power users who will be trained before using the system.

  10. @Laurent, more excellent thoughts!

    I particularly like the switch and light bulb example, since recently my 1 year-old daughter figured out the connection between the two.

    If we get technical about it, when we flip the switch we are directly (or one-step removed) interacting with the flow of electricity, and the light bulb turning on or off is a consequence. Of course for most purposes, the status of the light bulb would be considered the content.

    That is a good point about interacting directly with the container (the electrical circuit). In the common example of pushpins (content) on a map (container illustrating spatial relationship between content), you can interact with the container as well. The map tiles, like road or satellite view, could also be considered content.

    10/GUI still has temporal and parallel action directness, but that's not much better than a mouse. That other link you posted looks interesting. Using one MT device for a better screen is cool but I bet have a UI on the MT screen would be best of all, rather than just being a MT pad.

  11. Richard was right -- I have updated slightly my preferred definition. See the new post: NUIs reuse existing skills

  12. Hi, If i say that NUI is something which is Hands-off while previously we were in Hands-On environment. It would be the sweetest and the shortest definition possible for it.