Augmented reality headset enables users to see hidden objects

An augmented reality headset combines computer vision and wireless perception to automatically locate a specific item that is hidden from view, perhaps inside a box or under a pile, and then guide the user to retrieve it.

Augmented reality headset enables users to see hidden objects by Adam Zewe for MIT News

MIT researchers have built an augmented reality headset that gives the wearer X-ray vision. The headset combines computer vision and wireless perception to automatically locate a specific item that is hidden from view, perhaps inside a box or under a pile, and then guide the user to retrieve it.

The system utilizes radio frequency (RF) signals, which can pass through common materials like cardboard boxes, plastic containers, or wooden dividers, to find hidden items that have been labeled with RFID tags, which reflect signals sent by an RF antenna.

The headset directs the wearer as they walk through a room toward the location of the item, which shows up as a transparent sphere in the augmented reality (AR) interface. Once the item is in the user's hand, the headset, called X-AR, verifies that they have picked up the correct object.

When the researchers tested X-AR in a warehouse-like environment, the headset could localize hidden items to within 9.8 centimeters, on average. And it verified that users picked up the correct item with 96 percent accuracy.

X-AR could aid e-commerce warehouse workers in quickly finding items on cluttered shelves or buried in boxes, or by identifying the exact item for an order when many similar objects are in the same bin. It could also be used in a manufacturing facility to help technicians locate the correct parts to assemble a product.

"Our whole goal with this project was to build an augmented reality system that allows you to see things that are invisible - things that are in boxes or around corners - and in doing so, it can guide you toward them and truly allow you to see the physical world in ways that were not possible before," says Fadel Adib, who is an associate professor in the Department of Electrical Engineering and Computer Science, the director of the Signal Kinetics group in the Media Lab, and the senior author of a paper on X-AR.

Adib's co-authors are research assistants Tara Boroushaki, who is the paper's lead author; Maisy Lam; Laura Dodds; and former postdoc Aline Eid, who is now an assistant professor at the University of Michigan. The research will be presented at the USENIX Symposium on Networked Systems Design and Implementation.

Augmenting an AR headset To create an augmented reality headset with X-ray vision, the researchers first had to outfit an existing headset with an antenna that could communicate with RFID-tagged items. Most RFID localization systems use multiple antennas located meters apart, but the researchers needed one lightweight antenna that could achieve high enough bandwidth to communicate with the tags.

"One big challenge was designing an antenna that would fit on the headset without covering any of the cameras or obstructing its operations. This matters a lot, since we need to use all the specs on the visor," says Eid.

The team took a simple, lightweight loop antenna and experimented by tapering the antenna (gradually changing its width) and adding gaps, both techniques that boost bandwidth. Since antennas typically operate in the open air, the researchers optimized it for sending and receiving signals when attached to the headset's visor.

Once the team had built an effective antenna, they focused on using it to localize RFID-tagged items.

They leveraged a technique known as synthetic aperture radar (SAR), which is similar to how airplanes image objects on the ground. X-AR takes measurements with its antenna from different vantage points as the user moves around the room, then it combines those measurements. In this way, it acts like an antenna array where measurements from multiple antennas are combined to localize a device.

X-AR utilizes visual data from the headset's self-tracking capability to build a map of the environment and determine its location within that environment. As the user walks, it computes the probability of the RFID tag at each location. The probability will be highest at the tag's exact location, so it uses this information to zero in on the hidden object.

"While it presented a challenge when we were designing the system, we found in our experiments that it actually works well with natural human motion. Because humans move around a lot, it allows us to take measurements from lots of different locations and accurately localize an item," Dodds says.

Once X-AR has localized the item and the user picks it up, the headset needs to verify that the user grabbed the right object. But now the user is standing still and the headset antenna isn't moving, so it can't use SAR to localize the tag.

However, as the user picks up the item, the RFID tag moves along with it. X-AR can measure the motion of the RFID tag and leverage the hand-tracking capability of the headset to localize the item in the user's hand. Then it checks that the tag is sending the right RF signals to verify that it is the correct object.

The researchers utilized the holographic visualization capabilities of the headset to display this information for the user in a simple manner. Once the user puts on the headset, they use menus to select an object from a database of tagged items. After the object is localized, it is surrounded by a transparent sphere so the user can see where it is in the room. Then the device projects the trajectory to that item in the form of footsteps on the floor, which can update dynamically as the user walks.

"We abstracted away all the technical aspects so we can provide a seamless, clear experience for the user, which would be especially important if someone were to put this on in a warehouse environment or in a smart home," Lam says.

Testing the headset To test X-AR, the researchers created a simulated warehouse by filling shelves with cardboard boxes and plastic bins, and placing RFID-tagged items inside.

They found that X-AR can guide the user toward a targeted item with less than 10 centimeters of error - meaning that on average, the item was located less than 10 centimeters from where X-AR directed the user. Baseline methods the researchers tested had a median error of 25 to 35 centimeters.

They also found that it correctly verified that the user had picked up the right item 98.9 percent of the time. This means X-AR is able to reduce picking errors by 98.9 percent. It was even 91.9 percent accurate when the item was still inside a box.

"The system doesn't need to visually see the item to verify that you've picked up the right item. If you have 10 different phones in similar packaging, you might not be able to tell the difference between them, but it can guide you to still pick up the right one," Boroushaki says.

Now that they have demonstrated the success of X-AR, the researchers plan to explore how different sensing modalities, like WiFi, mmWave technology, or terahertz waves, could be used to enhance its visualization and interaction capabilities. They could also enhance the antenna so its range can go beyond 3 meters and extend the system for use by multiple, coordinated headsets.

"Because there isn't anything like this today, we had to figure out how to build a completely new type of system from beginning to end," says Adib. "In reality, what we've come up with is a framework. There are many technical contributions, but it is also a blueprint for how you would design an AR headset with X-ray vision in the future."

"This paper takes a significant step forward in the future of AR systems, by making them work in non-line-of-sight scenarios," says Ranveer Chandra, managing director of industry research at Microsoft, who was not involved in this work. "It uses a very clever technique of leveraging RF sensing to augment computer vision capabilities of existing AR systems. This can drive the applications of the AR systems to scenarios that did not exist before, such as in retail, manufacturing, or new skilling applications."

This research was supported, in part, by the National Science Foundation, the Sloan Foundation, and the MIT Media Lab.