Monday, January 19, 2009

What we can learn from Dvorak

There is a commonly quoted belief that the QWERTY keyboard is so popular due to a bit of luck and coincidence plus a first-mover advantage that locked the world into the QWERTY layout. There are other keyboard layouts; the distant second place (at least for US English) is Dvorak. Some fans of Dvorak claim that it is technically superior to QWERTY, allowing faster speeds, etc., but QWERTY cannot be displaced because of lock-in. Essentially this can be considered an argument that the free market for keyboard layouts is broken.

I just read this article (from 1996) about the history behind the Dvorak and QWERTY keyboard layouts.

The core purpose of this article is disproving the idea that the keyboard layout market is broken. After reviewing the central myth, the article discredits the original circumstances and shows that most of the myth is just republished accounts of a study done by Dvorak himself that was severely biased. It then goes further into historical newspaper accounts to show that there is credible evidence that the QWERTY layout was simply a superior layout for typewriters, and once it was popular, nothing else provided a cost-effective alternative or path to switch.

How does this apply to NUI and multi-touch devices? I think we can learn from some of the mistakes and failures of the Dvorak layout, or at least its marketing.

  • Face the truth

    Dvorak proponents assumed their product was superior. When test results came back and weren't favorable, the results were skewed to look good. A better strategy would have been to fix the flaws instead of cover them up.

  • Market superiority trumps technical superiority

    Even if Dvorak is technically superior, that doesn't necessarily mean it has a market advantage if businesses or consumers have no financial reason to adopt it.

We who work on NUI applications should make sure their products have a true market advantage. Think about how to transform the unique user experience of these devices and applications into cost savings or additional revenue sources. We should also take all feedback -- positive, negative, internal, external -- and use it to improve our products.

Who knows, perhaps we can replace some keyboard data input with multitouch devices if we have a compelling user experience combined with a market advantage in the right application.

Friday, January 16, 2009

NUI is appropriate for Data Heavy Applications

In a recent post at Point & Do, Jonathan discussed when gesture-based interfaces are inappropriate. One idea was that NUI and gesture interfaces won't work well when dealing with heavy data input, particularly input that requires a keyboard.

While I agree that data input requiring keyboard isn't going away, we should distinguish between the input and the analysis and processing of data. Just because an application is data heavy doesn't mean it can't use NUI. Only the data input portion may require traditional interfaces with keyboards and mice. Once the data is in the system, a NUI application can be much better suited for analysis than a GUI. Complex relationships can be visualized, and direct interaction with objects reduces the abstractions between the users and the data they are trying to understand.

As an example, consider this video of a integrated Microsoft Dynamics CRM and Surface application that I developed at Infostrat. This is a constituent services case management demo.

In this application, the data is entered through traditional methods (GUI) to the CRM database via a web browser, Outlook client, or third-party add-on. Surface reads the database and displays the CRM constituent cases alongside other data sources relevant to the constituent's comment subject area (Crime, Housing, Energy Prices, etc.) The Surface application can be used for data analysis as well as presentation of findings. The paper that was placed on Surface was printed at a regular PC from within a CRM constituent case and contains a byte tag that the Surface can recognize. When the paper is placed on the Surface, the map and information panel jumps to the associated entity from CRM. Additional note taking and data entry is possible on Surface, but not as much as in the regular CRM clients.

This type of combining traditional data entry and NUI data analysis could be implemented effectively in other situations as well. What traditional data applications do you think could be augmented by a NUI interface?

Tuesday, January 13, 2009

EEE touch

Here is another video from last week. It's a new Eee touchscreen PC. It looks like it is single-touch, but it has a couple of nice NUI applications.

via Engadget

Post-it application (0:45)

The post-it application has good animated transitions that make it easy to understand what your should be doing. You're often working with the content directly. One possible issue though is whether changing from the clock to the post-it page requires a swipe from that button on the bottom or not. If it does, how does the user figure that out? It would be better as a two-position slider than a button.

The demonstrator seems to not be 100% familiar with the application gestures, since, for example, sometimes he launched the post-it app with a single tap and other times with a double tap. Also he couldn't figure out how to get back to the clock from the post-it mode without resetting the app. (Did it require a swipe to the right?)

Regarding accidental double-taps, many users coming from WIMP may have the urge, as in this video, to double-tap to launch or activate an icon. NUI applications have to account for this and 1) not have different behavior for a single vs. double tap on the same element, and 2) account for what will happen if the user taps an interface element more than once before the action completes. This is a good argument for making the interface respond instantly, particularly if the requested action will take some time to process.

Media application (2:45)

Another good detail is in the media application, which was designed for touch, notice the minimize and close (X) buttons are extra-large, good targets for the finger. Much of the rest of the video shows regular Windows (WIMP) applications, which respond to touch, but aren't optimized.

On screen keyboard (4:05)

Allowing the keyboard to resize might be good for adjusting to different screen resolutions and hand sizes, but the screen form-factor doesn't seem to be optimal for touch-typing and regular keyboards are one-size fits all, so the resizing may not be that useful.

The www and smileycon keys are good ideas to save time, considering on-screen typing won't be as fast as with a keyboard. The hold-flick to capitalize seems a bit buggy and slow. I'd probably stick with a shift key if I were using it, but with only single touch it'd have to be a sticky shift key.

Monday, January 12, 2009

Gestures without Direct Interaction is not Natural

This video came out at MacWorld last week. It shows a multitouch frame over a large display interfacing with OS X. It works like the multitouch trackpad that new MacBooks. Take a look at how the manipulations gestures work (when they work), particularly the image rotation around 0:22 - 0:27.

via Engadget

The demonstrator holds down one finger and drags around with a second finger, but the image rotated around it's own center. It didn't track the demonstrator's fingers at all. I am guessing the gesture engine recognizes the gesture and applies it to whatever is on the screen under one of the contacts (like the stationary finger.)

The MacBook trackpad probably works the same way, but the trackpad is not a display device, so it is not expected to have direct interaction. When you take the same feature and overlay it on the screen, it is not natural at all.

This use of gestures is really just a shortcut for certain tasks. The image rotation in the video is functionally equivalent to using a scroll wheel on a mouse to rotate the image. You really don't need multitouch to accomplish what is shown. Multitouch is not used to it's full potential in this case.

Friday, January 9, 2009

Deconstructing: Financial Services Sample Application

The purpose of this kind of post is to analyze the user experience presented in NUI demo videos. I will keep in mind that videos do not show the entire experience. The goal is to pick out what seems to work well and what doesn't, then figure out what might be better for future applications.

Application: Financial Services Sample Application
Author: Razorfish

Microsoft Surface Financial Services Application - Razorfish Demo from razorfish - emerging experiences on Vimeo.

Continue below for feature analysis...

Feature:Drop coins to display marketing facts
Time:0:03 - 0:13
Good:Serves as an attract mode, prepares user for interacting with the Surface using real objects
Bad:If there are no coins available, this interaction is not possible. The only clue that the user can do that is background text "Drop a coin."
Suggestion:Provide visual feedback from finger and blob contacts as well as coins. Provide a "ghost coin" visual that prompts the user to put a coin down rather than textual instructions. Upon further review, this idea is already implemented, but really hard to see in this video. There are bubbles that pop-up and say "Place coin" (I think.) After a short time the bubbles pop and appear somewhere else. Below are screen captures from a few of the 20 frames or so that show the coin bubbles.

Feature:Select tasks by placing a products or services token then dragging options to the center circle
Good:Consolidating multiple options into a removable visual component keeps the UI free of menus and clutter
Bad:Can the application still work if the tokens get lost? In an unguided public-facing application, this may be an issue.
Suggestion:Consider an alternative way to access the product and services items using just touch, secure the tokens from leaving the Surface, or have a bunch of extra tokens. Perhaps make the tokens fridge magnets or something that customer are allowed to take.

Feature:Drag products and services onto activity wheel. Wheel lets you select from chosen services. Tap on a service on the wheel to remove it.
Time:0:19 - 0:28, 0:53-0:55
Good:Only having a few products/services in the wheel simplifies the user choices. The animations are very good: highlighting the wheel when the user is dragging indicates the user can drop the service and it will stick. There is "life" in the wheel as it bounces and the tasks move around if the user has not touched anything for a while. The physics of the repeating list box snapping to an item while scrolling are well developed.
Bad:Part of the program flow seems awkward -- an unnecessary level of indirection. The number of choices goes from many (the tags on the tokens) to few (the selected services), which feels like it goes against the scaffolding principal of progressive disclosure. If the wheel was used for directly comparing products (like phones or cars), it might make more sense to narrow the selection down, but it seems the wheel is used as a list of activities.

Tapping on the wheel services to remove them is not obvious. That gesture is more likely to happen accidentally while exploring the application. It seems like dragging them off would be the appropriate thing to do, since dragging them on put them there in the first place. (That may be an option but is not shown.)
Suggestion:I'd like to see other options, maybe expanding the token tabs when selected.

Feature:Zooming the wheel from activity selection to activity cards
Good:Zooming gets everything irrelevant out of the way and focuses attention on the activity.
Bad:Using the zoom gesture at this point is very much a learned action and not natural. If the user doesn't do anything, hands pop up on the screen and show the interface can be zoomed. (This isn't in this video -- it's in another non-public video.) While it is good to guide the user to the next step, it would be much better if they do not need guided. Without the hands, how would a new user figure out this action? There is no perceived affordance for zooming the wheel. Similarly, zooming out is accomplished by the pinch gesture or by double tapping the outside of the wheel, which is not obvious.
Suggestion:Keep the method of clearing irrelevant UI elements, but design the interface so the user can clearly and easily understand that there is a path and how to get there (and back.) Maybe it would be better if the user could select tasks from the scrolling task list the same way the application launcher works. It would at least be a more familiar action than waiting for ghost hands to show you what to do. Putting a button on the wheel to go back would make it more obvious how to get back.

Feature:Activity cards, graphs, information
Time:0:30 - 1:05
Good:Appropriate use of scatter view and excellent graphics simplifying the complex banking concepts.

Feature:Dropping a flyer with byte tag as a shortcut to the contents of that flyer
Time:0:42 - 0:44
Good:Taking a shortcut to the content is a good idea.
Bad:The business value of this feature is unclear. If they got the flyer in the mail, it's unlikely they will drive to the bank to use the Surface to read similar content. If they grabbed it in the bank, what does Surface bring to the customer's experience?
Suggestion:Try to integrate the Surface application with the user's normal experience and make the value-added more clear. In this application, Surface seems to be an interactive flyer. There is some cool-factor that may get foot traffic in the door, but once that wears off, this application will sit in the corner. If Surface was used to actually apply for loans or otherwise integrate with the normal business of the bank, it will have a much higher value.

Feature:Interact with a map to view ATM and branch locations
Time:1:09 - 1:23
Good:The map interaction seems clean. There is a button at the bottom of the map to get back to the wheel, though the video shows her zooming out to the wheel. If that wasn't an accident, how does it tell the difference between zooming out the map and exiting the map? The pushpin selection and interaction look good. I like the spin when the information appears.
Bad:The same comments from above apply here regarding dragging the map icon to the wheel to enter the map. It would be more direct to just tap a button on the map icon. The wheel isn't involved in the map interaction so why drag to the wheel?
Suggestion:Allow more direct interaction to enter and exit different application modes. Provide opposite gestures to do opposite actions. Drag to enter and tap to exit (similar to the token products and services) is not intuitive.

Overall thoughts: The application is highly polished and looks professional. The interactions as designed have good feedback and animations. The selection of gestures and program flow could use some work to make it more intuitive. The feature design could use some more analysis to make it work better and have a higher value if it were deployed in the real world. It works well as a demo, though, discounting the wheel zoom gesture, which I rather dislike.


Thursday, January 8, 2009

System Gestures vs. Manipulations

In my previous post, I mentioned that gestures need to be evaluated in context to determine what the user intends. We can break down gestures into two types: System Gestures and Manipulations.

A system gesture is a gesture (as previously described) that can be performed in any context and have the same meaning. For example, if a foreground document editor responds to a downward flick gesture to go scroll down regardless of where the flick started or ended on the screen, that would be a system gesture.

A manipulation is a gesture performed on or to an UI element on screen, or in NUI terms, an object. The object will only perform the action if the gesture falls at least in part within the its bounds. For example, if a 3-D carousel only scrolls if the user starts a drag gesture from within the visible elements and not just anywhere, that is a manipulation.

One core NUI concept for touch interfaces is direct interaction. This means that when you touch an object, that object should respond, not something else. Applying this to gestures, most interaction should use manipulations rather than system gestures. The reasoning is simple: if you use a system gesture over one object (say a downward flick over a non-active window) but another object responds (the foreground document editor instead of what the finger was over), then the user can easily become confused.

Manipulating objects directly is much more intuitive and much easier to discover than generic system gestures. Objects can have affordances (future topic) that help the user figure out what they can do with it. System gestures are only learned by reading instructions or by accident (or socially.)

Most system gestures are not really NUI worthy. They are actually a step backward into the era of rote learning of CLI commands, except with touch. Some of the first generation of touchscreen laptops seem pretty cool, but when you figure out you can only really use manipulations within a proprietary application, and everywhere else it just uses system gestures as shortcuts for common tasks, it doesn't seem nearly as cool. That's mostly a software problem, though, since most existing software is written for WIMP, not for touch.

There are a few exceptions. The application chooser on Microsoft Surface is basically a horizontal scrolling picture list. It scrolls when you drag anywhere on the screen. In fact, with the exception of tapping the corner hotspots, the application icons, or the "I'm done" button, if visible, "drag anywhere to scroll" is all you can do. Even though it is a system gesture, it was smart to use because new Surface users may still be learning that Surface responds to touch. They are much more likely to discover the touch responsiveness if 98% of the screen will scroll, and the rest responds to taps. In this case, the concept of scaffolding overrides direct interaction. (More future topics.)

Regardless, when thinking about gestural interfaces, we should design for manipulations in almost all cases.

Wednesday, January 7, 2009

Gestural Application Model

When talking about and designing gestural applications, there are many new terms and concepts floating around. Some concepts were being compared where I felt we were talking about apples and oranges. In an effort to organize some of the ideas present in gestural applications, I thought up this Gestural Application Model. (It is similar in form to the TCP/IP layer model.)

Layer 0: Device
This layer contains raw sensor data, perhaps processed by device drivers or a very low-level API. For example, a stream of X,Y coordinate pairs from a capacitance touch screen would fit into this layer. Also, in the case of Microsoft Surface, the stream consisting of finger, blob, tag, and raw visual data goes here. Everything on the upper layers depends upon the type of data available from the device.

Layer 1: Event
Raw sensor data streams are grouped into events that describe the type and value of the raw data, basic state transitions, and any other relevant information for the sensor type. State transitions include things like Contact Down, Contact Moved, and Contact Up. Internally, this requires interpreting the data stream into a persistent object. For example, is this X,Y coordinate a new touch or is it the same one from the previous time step but moved? Additional data might include, in the case of the Surface: Position, Size, Orientation, and Object ID (for byte tags.)

Layer 2: Gesture
The application collects all the events within a time frame, organizes them, and interprets them as gestures. A gesture can be composed of many events.

Touching the Surface by itself is not a gesture, but if you touch and release within a certain time frame, it becomes a tap gesture. Alternately, if you touch and hold for longer, or move your finger, then release, it could become a hold gesture or a move gesture. Each gesture consists of multiple events.

The same set of events could be interpreted as different gestures, depending upon what the application is expecting or cares about. That move gesture could be a hold gesture if the application doesn't care if the user moves the finger a little bit, or a lot.

One key difference between an event and a gesture is that an event is instantaneous, but a gesture has a beginning, middle, and end. Gestures can be in progress or completed.

Event: This [sensor data] did [state transition] at [time]
Gesture: [Gesture Type] is happening or happened.

Layer 3: Intent
Here gestures are married with context to determine what the user intended to do. Once the application interprets the user's intent, it can take action.

Intent depends highly upon context. The context includes where the gesture was done (relative to visual interface elements) as well as application modes. Compare a tap gesture in the middle of nowhere with a tap gesture over a button interface element. The two might be identical but without the context it is hard to figure out what the user wants. Similarly, the user might drag a finger over an image but want different things depending upon whether the application is in a pen/drawing mode or a panning/moving mode.

Part of the application designer's job is to also figure out situations where a user might use a gesture in the wrong context (i.e. when the user's intent and the interpretation of his or her gestures are not the same) and minimize or eliminate the effects of mis-interpreting intent. Ideally a single gesture will only ever be used for a single action. If the application supports multi-touch hardware, then there are many gestures available so reuse should not be a problem.

(I originally drafted this model in a comment at Point & Do. I decided it should be a model, rather than a stack due to the unfortunate acronym that stack creates.)

Monday, January 5, 2009

The Natural User Interface Revolution

First, there were Command-Line Interfaces. That worked well for a while, but when computing power grew, so did application requirements and user demands. The CLI was replaced, for all intents and purposes, by the Graphical User Interface. Everyday users (both professional and consumer) began using the GUI rather than the CLI as the primary interface to the computer. The CLI is still around, since it is still good for certain, specialized tasks, but most of those tasks could also be done using the GUI, if necessary.

We are on the verge of a similar revolution -- from GUI to Natural User Interface. Similar to the CLI to GUI transition, we're at a point where computing power and expectations have grown. The way we think about how we interact with computer will have to change. GUI will still be around, but it may be relegated to specialized tasks, similar to CLU. It will not be a sudden transition. There are already NUI interfaces out there, but we may not recognize them since they work so well. The only way we know we are in the NUI era is to reflect on the history of interfaces and analyze the attributes and trends of current interface design.
Let us reflect.







Prompt, Command and arguments, Result

Single task, Single user, Command oriented, Keyboard input


Papers arranged on a desk

Windows, Icons, Menus, Pointer (WIMP)

Multi-task, Single user, Task oriented, Keyboard + Mouse input


Objects, Containers, Gestures, Manipulations (OCGM)
Multi-task, Multi-user, Object oriented, Touch input

There are some variations (GUI could be object oriented, NUI doesn't have to be multi-user) but this table identifies the general trends.

As for the elements of a NUI -- that is still up for debate. Jonathan Brill has suggested Places, Animations, Things, and Auras as the core elements of NUI. I'm still thinking about those. Update 3/1/2010: I have decided to support OCGM as the core elements of NUI.

It isn't really up to us to decide in advance, though. The successful NUI applications will have some common features that over time we will be able to identify. That is part of what I hope to do on this blog: Deconstruct NUI interfaces (particular demo videos), analyze the elements to see what works and what doesn't, and track NUI trends as they evolve.