Research

Domain Adaptation

In the real world source and target data differ in many cases. For example, different sensors might be used for the same recognition task. In these projects, we study on several frontiers the potential of domain adaptation.

Between Worlds. Kinect or body worn sensors?

The Kinect offers great potential for many activity recognition tasks. OpenNI provides a reliable motion capturing, without requiring any body attached sensors.
In the past most research on the activity recognition was based on acceleration sensors. In order to reuse such models or for cross-modal recognition, adaption methods are required. This work shows experimental results on creating an accelerometer model for Kinect-estimated joints. Future steps include a more realistic spring-model, as well as, learning a suitable adaptation model, given corresponding acceleration data from a body-worn sensor.

Fig 3. Unpublished work, presented at research group retreat 02/2011 (Click to enlarge)

Improving the Kinect's depth

The introduction of the Microsoft Kinect Sensors has stirred significant interest in the robotics community. While originally developed as a gaming interface, a high quality depth sensor and affordable price have made it a popular choice for robotic perception.

Its active sensing strategy is very well suited to produce robust and high-frame rate depth maps for human pose estimation. But the shift to the robotics domain surfaced applications under a wider set of operation condition it wasn't originally designed for. We see the sensor fail completely on transparent and specular surfaces which are very common to every day household objects. As these items are of great interest in home robotics and assistive technologies, we have investigated methods to reduce and sometimes even eliminate these effects without any modification of the hardware.

In particular, we complement the depth estimate within the Kinect by a cross-modal stereo path that we obtain from disparity matching between the included IR and RGB sensor of the Kinect.

We investigate how the RGB channels can be combined optimally in order to mimic the image response of the IR sensor by an early fusion scheme of weighted channels as well as a late fusion scheme that computes stereo matches between the different channels independently.
We show a strong improvement in the reliability of the depth estimate as well as improved performance on a object segmentation task in a table top scenario.

Domain Adaptation for improved cross-modal matching
However, the method in [1] is troubled by interference from the IR projector that is required for the active depth sensing method.

Learned Filters
In [2] we investigate these issues and conduct a more detailed study of the physical characteristics of the sensors as well as propose a more general method that learns optimal filters for cross–modal stereo under projected patterns.

Both our approach in [1,2] improves results over the baseline in a point-cloud-based object segmentation task without modifications of the kinect hardware and despite the interference by the projector.

Learned Filters

[1] Improving the Kinect by Cross-Modal Stereo, W. Chui, U. Blanke and M. Fritz, BMVC, (2011)

[2] I spy with my little eye: Learning Optimal Filters for Cross-Modal Stereo under Projected Patterns, W. Chui, U. Blanke and M. Fritz, C4CV in conjuction with ICCV, (2011)