At A Glance Main Projects Tutorials Resume

Contact


Email: palen1c at gmail.com




Analysis of the Kinect For Windows SDK 1.8 Background Removal API

Wed, 25 Sep 2013 22:30:07 EST

An exciting new feature called the background removal API has been added to the Kinect For Windows SDK 1.8 which was released last week. Background removal or green screening is a feature many people have been using the Kinect for to varying degrees of success. Prior to the background removal API, getting a decent green screen effect out of the Kinect required a lot of heavy lifting and creative thinking because the depth data from the Kinect is too noisy to use for a smooth player mask.

A photo using the default Kinect depth data.
















A photo taken using the default kinect depth data with no blur or special techniques.

The official Kinect SDK 1.8 Background Removal API is a great step forward for developers as it allows obtaining a great green screen effect with minimal work. However there are a few important restrictions in place which make it unusable right now for a multi-user photo experience.
  1. The initial Background Removal API requires successful skeleton tracking to work.

  2. The initial Background Removal API only allows the effect to be preformed on one tracked player at a time. Update: 10/3/2013 - Joshua Blake writes in from Twitter, "By the way, you can do multiple people background removal in 1.8. You just need to create an instance per tracked skeleton.". Thanks Joshua.
Those criticism's aside, it would be awesome if Microsoft were able to decouple the Background Removal API from skeleton data or add a mode where you can specify depth away from the camera as a threshold instead of detected skeletons.

A photo using the Kinect SDK 1.8 Background Removal API.
















A photo taken using the Kinect SDK 1.8 Background Removal API.

This summer for the Kinect Green Screen Photo kiosk I had at Maker Faire Detroit, I invested about a month of time figuring out how to get a good mask out of the Kinect in real time. The approach I used was to ignore skeleton data, and only use depth data, then run that raw data through EMGUCV(.Net OpenCV) and do blob detection, take the detected blobs, and run them through a point by point averaging algorithm based on work done in openFrameworks. I also used a simple shader based blur effect available in Windows Presentation Foundation as it proved way faster than any other implementations I tested as well as writing my own box blur or gaussian blur. While the results I came up with are not as good as the official SDK implementation they are pretty close and don't require detected skeletons. (The project is open source, so you can check out my implementation here: https://github.com/transcendingdigital/MFDetroit2013_Kinect_GreenScreen_PhotoKiosk) Not requiring skeleton detection is a HUGE factor when groups of users are posing for photos.

A photo using EMGU CV and custom techniques.
















A photo taken using custom techniques and EMGU CV.

How I think Microsoft's Background Removal API Works

The unfortunate part of the Microsoft Background Removal API is that it is closed source; so people like me who have been working on the same problem are interested to know exactly what is going on at a low level. I wish they would release a paper on the technique being used. Based on my own work there are a few things I am going to guess Microsoft is doing.
  1. I am pretty sure they are using some sort of frame averaging of the depth data. During my own work I found this concept presented by Karl Sanford in his early work smoothing depth data. In my testing, averaging the depth data was too slow of a process in managed C# code and the results were not very good for creating smooth masks that fit the contour of subjects; so I threw out this technique. The tell in the 1.8 SDK that this is happening is when you wave your hands around or move fast, you can see some lag in the mask as it follows you; which could either be from frame averaging or intentional slow down of processing to increase end user performance.

  2. I am confident that they are using external computer vision libraries that may have some licensing restrictions preventing them from being included in the official Kinect SDK DLL's. You can tell this because the background removal API is oddly contained in a 32 and 64 bit separate DLL from the main Kinect SDK dll. If you use EMGU CV, it requires individual 64 bit and 32bit dlls because it reaches into low level unmanaged code for much of the functionality it provides. I'm willing to be that the background removal API takes advantage of unmanaged code to speed up real time processing.

  3. The requirement of including the skeleton data may only be used to know where to place the mask and may have no benefit on Microsofts background removal. In my own implementation the skeleton data is useless because it does not provide any detailed contextual data to help hug the contours of each humans body. It only provides stick figure like data.

  4. I am guessing that the official 1.8 SDK background removal is limited to one player due to performance barriers. I don't know what the internal allowed CPU usage is for the Kinect SDK, but I would imagine Microsoft attempts to keep it low enough that developers do not have a problem integrating the SDK into end applications. Background removal requires frame by frame processing which is very processor intensive.

  5. Microsoft spent time massaging the depth data provided by human hair and the head region. In the default depth data from the Kinect, human hair is always an issue causing the mask to really degrade some times on a persons head. This makes sense because the Kinect depth data is provided by an infrared grid that only has a resolution of 320x240.

  6. There is some sort of blur going on, but it doesn't look like a conventional blur. Take a look at the edges of the photo from the background removal api. They have weird gaps in the pixels almost like some of the data is being removed for faster processing, or a better attempt at soft refinement of the edge of the mask.

I hope this article has provided some insight into background removal not seen elsewhere. I am looking forward to the next version of the Kinect hardware and appreciate Microsoft's continued commitment to the Kinect for Windows SDK.

Charles Palen has been involved in the technology sector for several years. His formal education focused on Enterprise Database Administration. He currently works as the principal software architect and manager at Transcending Digital where he can be hired for your next contract project. Charles is a full stack developer who has been on the front lines of small business and enterprise for over 10 years. Charles current expertise covers the areas of .NET, Java, PHP, Node.js, Javascript, HTML, and CSS. Charles created Technogumbo in 2008 as a way to share lessons learned while making original products.

Comments

Chrales
Chrales
April 17, 2015 2:58 pm

Thanks Nathan. I had not ever seen this presentation but wish I would have at the time I worked on this project. My efforts focused around research done by others in traditional image processing contexts. I am not sure if they are using vmx based segmentation.

I hope this work can help you in your own efforts!

Nathan
Nathan
April 13, 2015 3:40 pm

Thank you very much for the article Charles. I'm going to study your example. Have you ever heard this presentation? http://www.microsoft.com/en-us/download/details.aspx?id=28085 I would be interested in your feedback. The second speaker touches on related information. Could the background removal be vmx based segmentation?
thank you
Nathan

Charles
Charles
November 3, 2013 4:28 pm

Hi John,

Sorry for the confusion, but I think I understand the situation better.

This application was built and tested using the "Kinect For Windows" commercial sensor. I have not tested it using the XBox version sensor, so that may be the issue. There may be functionality such as near and seated mode that I use in this application not available using the xbox sensor.

By default the application will use a mouse if no Kinect hardware is detected but I may be doing other things in the application that do not work well with the xbox version.

Sorry you are running into issues.

John
John
November 3, 2013 07:39 am

Hi Charles!

The value useCms is False,Itīs rarely because If i test with a logitec webcam, a hand appears and I can go to the other screen to select the background and make the picture (appear no kinect detected obiosluy). But when i test with a kinect xbox and run the app only appears the main screen with the text in the top and the button in the lower left, but no appear the hand and no crash. My graphic card is Nvidia and w7 32 bits s.o. Do you know what is happening?
Txs so much!

Charles
Charles
November 2, 2013 8:10 pm

Hi John,

Until you submit images, the first screen of the Kinect application will be white with a button in the lower right. If everything is setup correctly, it should still respond to a detected skeleton and ask you if you would like to take a photo.

If the application just starts, shows a white screen, and crashes, you should check the .exe.config file to disable connection to a Drupal CMS.

John
John
November 2, 2013 12:08 am

Hi Charles ,exelent work!
I have a problem qhen I try run the exe. I have readen the readme buy when I run the exe only appear the buttons and the background white...Can you help me please??

Theo
Theo
October 1, 2013 10:29 am

Great article, Charles. I'm also wondering how did Microsoft achieved such results. Specifically, how could they get rid of the problems with hair detection. I believe they must be using the color stream as well with some kind of background subtraction technique.
I was developing a very similar application but group pictures are very valuable to us, so using Microsoft's solution could not be an option.
Anyway, I'm looking forward to hearing from you for any updates on this matter.
Best regards,
Theo

Comments are currently disabled.