AI tools that could one day appear in Facebook’s AR glasses
Facebook is investing a lot of time and money into augmented reality, including making its own AR glasses with Ray-Ban. Right now, these gadgets can only record and share imagery, but what does the company think such devices will be used for in the future?
a new research project The leadership of Facebook’s AI team suggests the scope of the company’s ambitions. It envisions AI systems that are continuously analyzing people’s lives using first-person video; Recording what they see, do and hear to help them with everyday tasks. Facebook researchers have outlined a range of skills these systems seek to develop, including “episodic memory” (“Where did I leave my keys?”) and “audio-visual diurization” (remembering that When asked what).
Right now, the tasks outlined above cannot be achieved reliably by any AI system, and Facebook insists that this is a research project rather than a commercial development. However, it is clear that the company sees this kind of functionality as the future of AR computing. “Certainly, thinking about augmented reality and what we want to be able to do with it, there are possibilities down the road that we will take advantage of this kind of research,” explained Kristen Grauman, Facebook AI research scientist. ledge.
Such ambitions have huge privacy implications. Privacy experts are already concerned about how Facebook’s AR glasses allow wearers to secretly record members of the public. Such concerns will only be magnified if future versions of the hardware not only record footage, but analyze and transcribe it, turning wearers into surveillance machines on the go.
Facebook’s research project is named Ego4D, which refers to the analysis of first-person, or “egocentric,” videos. It consists of two key components: an open dataset of egregious videos and a series of benchmarks that Facebook thinks AI systems should be able to tackle in the future.
The dataset is the largest of its kind ever created, and Facebook has partnered with 13 universities around the world to collect the data. In total, approximately 3,205 hours of footage were recorded by 855 participants living in nine different countries. Universities, rather than Facebook, were responsible for collecting the data. Participants, some of whom were paid, wore GoPro cameras and AR glasses to record video of the non-scripted activity. This includes everything from construction work to baking to playing with pets and socializing with friends. Not all footage was de-identified by the universities, which included blurring of the faces of spectators and the removal of personally identifiable information.
Grauman says the dataset is “the first of its kind in both scale and diversity”. She says the closest comparable project includes 100 hours of first-person footage Shot entirely in the kitchen. “We have opened the eyes of these AI systems to more than just kitchens in the UK and Sicily, but [to footage from] Saudi Arabia, Tokyo, Los Angeles and Colombia. “
The second component of Ego4D is a series of benchmarks, or tasks, that Facebook wants researchers around the world to try to solve using AI systems trained on its datasets. The company describes them as follows:
episodic memory: What happened when (for example, “Where did I leave my keys?”)?
Forecast: What am I likely to do next (for example, “Wait, you already added salt to this recipe”)?
hand and object manipulation: What am I doing (for example, “Teach me how to play the drums”)?
audio-visual diurization: Who said what (for example, “What was the main topic during class?”)?
social interaction: Who is talking with whom (for example, “Help me better hear the person talking to me in this noisy restaurant”)?
Right now, AI systems will find it incredibly difficult to tackle any of these problems, but creating datasets and benchmarks are tried-and-tested ways to fuel growth in AI.
Indeed, the creation of a specialized dataset and an associated annual competition, known as ImageNet, is often credit for kickstarting Recent AI boom. The ImagetNet dataset contains images of a variety of objects, which the researchers trained an AI system to recognize. In 2012, the winning entry in the competition used a special method of deep learning to blast past rivals, inaugurating the current era of research.
Facebook is hoping that its Ego4D project will have a similar impact for the world of augmented reality. The company says the systems trained on the Ego4D could one day be used in not only wearable cameras, but home assistant robots, which also rely on first-person cameras to navigate the world around them. Huh.
“The project has a chance to really catalyze work in this area in a way that isn’t really possible yet,” says Grauman. “To move our field from the ability to analyze a plethora of photos and videos that were taken by humans with a specific purpose, to this fluid, AR system, robots running a running first-person visual stream.” needs to be understood in the context of ongoing activity.”
Although Facebook’s stated actions certainly seem plausible, the company’s interest in this area will worry many. Facebook’s record on privacy is amazing, stretched data leak And $5 billion fine from FTC. it’s also Gaya got to know Frequently That the company values growth and engagement over the well-being of users across multiple domains. With this in mind, it is worrying that the benchmarks of this Ego4D project do not include key privacy safeguards. For example, the task “audio-visual diversion” (transcribing what different people say) never refers to deleting data from people who don’t want to be recorded.
When asked about these issues, a Facebook spokesperson said ledge That it is expected that privacy safeguards will be introduced further down the line. “We expect that as companies use this dataset and benchmark to develop commercial applications, they will develop security measures for such applications,” the spokesperson said. “For example, before AR glasses can amplify someone’s voice, there may be a protocol that they follow to ask someone else’s glasses for permission, or they may limit the range of the device. So that it can only speak to people I am already talking to or who are in my vicinity.”
For now, such safeguards are only hypothetical.