Friday, January 14, 2011

Kinect hand tracking with WPF 4 and Bing Maps 3-D

I’ve been playing around with Kinect and PrimeSensor in WPF for a bit and have achieved a small technical success that I wanted to share in a new video from InfoStrat. Using the same techniques that allow us to use WPF 4 on Surface 1.0, we can now use depth camera hand tracking to control multi-touch applications.

Here is a very rough proof-of-concept where I’m controlling the InfoStrat.VE WPF 4 multi-touch control using a depth camera.

Bing Maps controlled with Kinect 3D-sensing technology

This is just a multi-touch application and I have added my one line of code to enable hand tracking to feed the WPF 4 touch stack.  I also display outlines of the tracked hands to provide better feedback about what is going on. In this video I also used OpenNI and NITE from PrimeSense.

The tracked hands can participate in all of the multi-touch manipulations and gestures that you’ve already written for your touch application. You can even interact using hand tracking and touch at the same time in the same window. The code that enables this is part of our internal InfoStrat.MotionFx project and will eventually be open sourced, but it needs a bit more work to be practical.

One enhancement we’re planning is adding hand pose extraction, including palm orientation and finger positions. This would allow you can use hand poses and hand gestures to control whether a hand is “touching” the screen or not instead of the current technique. Currently, it determines that a hand is “touching” if it is over a certain distance from the shoulder. Knowing the hand pose would also enable new types of interactions, just like finger orientation, blob size, and tagged objects on Surface enables new interactions.

In the end, we will want to design interfaces that use motion tracking to take advantage of the unique capabilities of that modality to create truly natural interactions. (Pinch-to-zoom with large arm movements is not the most natural interaction.) What I’ve shown above, though, is that it is feasible to use the WPF 4 Touch stack and Surface SDK as the unified platform for both multi-touch and motion tracking modalities.

16 comments:

  1. Josh, great job, you're the number one. Keep up the good work!

    ReplyDelete
  2. Finally a managed code solution to play around!!

    ReplyDelete
  3. Man, this is very interesting! Cant wait till the code is released. Great job!

    ReplyDelete
  4. Absolutely superb! VERY keen to see the source!

    ReplyDelete
  5. Very Interesting. Looking forwards to play around the code since you intend to release it.

    ReplyDelete
  6. Thanks for all the comments guys! :D

    ReplyDelete
  7. Awesome Josh!!

    Davide Zordan

    http://davidezordan.net

    ReplyDelete
  8. Josh,

    Awesome work. Can't wait to see more.

    Sean

    ReplyDelete
  9. Very cool! I recently built a similar interface: http://www.sharpgis.net/post/2011/01/12/Fun-times-with-Kinect-and-WPF.aspx
    This was based on CL NUI though, so I had to do most of the image analysis myself. I have since moved on to using PrimeSense and refined it quite a lot.

    ReplyDelete
  10. Thanks guys!

    sharpgis: Yes, I had seen your video. We have to admit the threshold based interaction is not ideal. I'm working on some new approaches for pose recognition as well as control design.

    ReplyDelete
  11. I agree that the threshold isn't a great idea. It's not a completely bad idea. Instead of having a static threshold in space, I now have a threshold relative to the users shoulders. If the hand is a certain amount in front of the shoulder, then he is in the threshold. This also have the benefit of that it doesn't matter if I'm far, close, to the left or right of the sensor. Everything is relative to the user, and allows for multi-user interaction.

    The threshold gesture is very stable and you rarely get false positives. I had some finger gestures early on (ie grab the map to move it), but with too many false positives, navigating the map became much harder. The current approach work well, I can very fast navigate to a certain location, and other people pick it up very well too (because the analogy is a touch screen in mid air, and they are used to touch screens, they just get it).

    ReplyDelete
  12. sharpgis:

    I'm actually using the hand-to-shoulder measurement already in this app and it is better than absolute threshold but still difficult to use. I think the solution is more mature hand pose recognition implementations.

    ReplyDelete
  13. hi, a question, how do you highlight hands?

    ReplyDelete
  14. Danielinux: I get the hand positions as reported by the NITE skeleton, project them to be relative to the depth image, then do edge detection in a 200x200 pixel area surrounding the reported hand position.

    ReplyDelete