Video Clips of Augmented Reality System in Operation

The links below are to files showing the Augmented Reality system in operation. These are in MPEG format acquired at a frame rate of 15 frames/second.

One note about the videos available here. Our method relies on tracking features in the scene and using those features to create an affine coordinate system in which the virtual objects are represented. These clips are from two implementations of the system. One used the corners of two black rectangles as the tracked feature points. The second implementation used green colored markers. If there are green markers in the image you are looking at the operation of the second implementation. There still are high-contrast rectangular areas in the images but those were not used as tracking features.

Overall view 1 (640 kbytes), Overall view 2 (954 kbytes)

This shows the overall view of the augmented reality system. There is the frame with the two black rectangles that is used to define the affine reference frame. The monitor shows the augmented view of the scene. Initially, there is no object shown and then a globe appears within the frame. The image on the video monitor may not be too clear. For video clips of just the augmented view follow the links below.

Basic operation

This shows the basic operation of the augmented reality system. A virtual object is positioned on the frame. It appears to stay fixed to the real object as that object is moved around in front of the video camera. Slight movements of the virtual object are due to inaccuracies in feature tracking and delays in the system.

Clock tower (1.3 Mbytes)
Spinning globe (1.5 Mbytes)

The orientation of the feature points in the previous video segments is done just for convenience. It allows automatic location of the object on the L-frame. This is not a requirement of the method. These two segments illustrate augmenting a scene where the feature points are placed in a more arbitrary arrangement.

Clock tower (935 kbytes)
Spinning globe (917 kbytes)

Construction example (839 kbytes) This is a two dimensional example showing an example in construction. A blueprint is applied over the area on a wall to give an augmented view of the interior of the wall. Distortions of the blueprint due to our affine approximation can be seen. The viewer can also see improper occulsions of foreground objects by the virtual blueprint.

Animation in affine space (585 kbytes)

An important feature of an augmented reality system would be the ability to animate the virtual objects. This clip shows a virtual cube whose translation is animated in the the affine coordinate frame.

Handling occlusions

This method of augmenting reality uses the computer graphics system to resolve hidden surfaces in the virtual objects and to properly handle virtual objects occulding other virtual objects. Due to the nature of the merging of the virtual scene with the live video scene, a virtual object drawn at a particular pixel location will always occlude the live video at that pixel location. By defining real objects in the affine coordinate system real objects that are closer to the viewer in 3D space can correctly occlude a virtual object.

Two clock towers (1.5 Mbytes)
Moving globe (512 kbytes)

Dealing with latency

Latency is as much of a problem in augmented reality systems as it is in virtual reality systems. Other than just getting faster equipment, some researchers are investigating predictive methods to help mitigate the latency effects. Most of these efforts use models of the human operator and position measurements to predict forward in time. Our system does not have position measurements available. Instead we experimented with simple forward prediction on the location of the feature points the system tracks. We assumed a constant velocity motion in image space and did a simple first order forward prediction. To filter some of the jitter introduced by noisy feature trackers we add Kalman filtering to the output of our color feature trackers.

Filtering results (2.1 Mbytes) This segment shows three sequences. The first is the unfiltered system with no prediction applied. The second applyies three frames of forward prediction. (We measured latencies in the range of 70 - 90 msec. or 2 to 3 video frames.) The jitters are due to errors in velocity computation associated with noise in the tracker output. The last sequence shows the results of adding a Kalman filter to the feature tracker output prior to velocity calculation. The clock tower stays in position much better but still does exhibit some jerky motion.
Registration test (631 kbytes) This segment shows how we measured registration error. A real scene was constructed in black with a nail in the center of the scene. The tip of the nail is painted white. A virtual point was placed at the tip of the nail. A tracker was locked onto the tip of the nail in the live video. As the L-frame is moved, the Euclidean distance between this tracked location and the reprojected location of the virtual point (using our method of affine representation) was calculated as the registration error. With our forward prediction of three video frames we found a factor of 2 to 3 decrease in registration error.

Video see-through operation (831 kbytes)

Because our system only uses the input from video cameras for defining its common coordinate system, switching to a video see-through head-mounted display (HMD) was a simple as placing two cameras on the HMD. These cameras view the real scene in stereo. The augmented view is presented to the user on the display of the HMD. In this sequence the monitor shows what the user is viewing.

Haptics in Augmented Reality

One of the areas that has not been investigated in augmented reality systems is the incorporation of interaction with the virtual objects. Using a PHANToM™ haptic interface device manufactured by Sensable Technologies. This interface allowed the user to operate the system in WYSIWYF mode (What You See is What You Feel).

Touching the globe (1.2 Mbytes) Here the user is tracing the coastline based on feel. When the active point of the Phantom is over land the user feels a rough sensation. Over water the point sinking into the globe simulating the soft surface of the water. The proper registration of sight and touch is maintained even while the globe is rotating.
Spinning the globe (514 kbytes) The user has control of the orientation of the globe. Whenever the user's finger is in contact with the globe it will spin about its center.
Cube hockey (2.1 Mbytes) The user is able to knock this cube around in the workspace. The cube correctly collides with the vertical part of the L-frame and can rest on top of it. It also is properly occluded by the frame when it passes behind it. The user feels the weight of the cube when lifting it with the "magnetic finger" and also senses the momentum of colisions between the cube and the vertical wall.

Handling occlusions (again)

In the examples of haptic interactions with the virtual objects it is easily seen that the user's hand and the Phantom are not properly occluding the virtual objects when they are in front. We do not model the hand and the Phantom in the virtual scene. (Barely visible in these sequences is a red marker that was added to assist the user in identifying where the active point of the phantom was located.)

It might be possible to define a model of the Phantom for the system but the user's hand and arm would still be a problem. We decided to explore the use of foreground detection using color statistics. At runtime, in any area of the video image where the color is statistically different than in a background scene analysed prior to starting operation we assume it represents a foreground motion at the 3D depth of the Phantom.

Phantom plane (557 kbytes) This shows the plane on which are detected foreground activity is assumed to take place. Only a small segment of the plane is shown. It actually is assumed to cover the entire scene.
Foreground (2.9 Mbytes) This blue areas of the image are where color statistically different from the empty scene was found. The computation is then performed only in the area of the bounding box for the graphics. Finally, the augmented image is shown when the live video is shown in the detected areas that are in closer to the viewer than any virtual objects.

Forward to the next augmented reality section

Back to Augmented Reality Home Page

Page last modified: 22 August 2002;