For the past two years, Facebook AI Research (FAIR) has worked with 13 universities around the world to integrate the largest dataset of first-person videos বিশেষ especially to train image-recognition models of deep learning. AIs trained in datasets would be better at controlling robots that interact with humans or interpreting images from smart glasses. “Machines in our daily lives will only be able to help us if they really understand the world through our eyes,” said Kristen Grumman of FAIR, who led the project.
This type of technology can support people who need help around the house, or guide people in the tasks they are learning to accomplish. “The video from this dataset is very close to how humans observe the Earth,” said Michael Ryu, a computer vision researcher at Google Brain and Stony Brook University in New York, who is not involved with Ego 4D.
But the potential abuse is obvious and worrying. The study was funded by social media Facebook, which has recently been sued in the Senate Imposing profits on the welfare of the people, A feeling proved by MIT Technology ReviewOf Own investigation.
The business model of Facebook and other big tech companies is to remove as much information as possible from people’s online behavior and sell it to advertisers. The AI described in the project reveals a person’s daily offline behavior, objects around that person’s home, what tasks he or she enjoys, who he spends time with, and even where his vision was – an unprecedented degree of personal information.
“There’s work on privacy that when you take it out of the world of investigative research and take it as a product, this work can be inspired by this project,” Grumman said.
Outside the kitchen
Ego4D is a step-change. The largest previous dataset of first person videos contains 100 hours of footage of people in the kitchen. The Ego4D dataset contains 2,502 hours of video that 555 people recorded in 73 different locations in 73 different countries (United States, United Kingdom, India, Japan, Italy, Singapore, Saudi Arabia, Colombia and Rwanda).
Participants had different ages and backgrounds; Some were hired for their lucrative occupations, such as baker, mechanic, carpenter and landscaper.
Previous datasets usually consist of a few seconds long semi-scripted video clip. For Ego4D, participants wore head-mounted cameras for up to 10 hours at a time and took first-person videos of walking, reading, laundry, shopping, playing with pets, playing board games, and copying daily activities. Interactions with other people Some footage includes audio, information about where the participants’ vision was, and multiple perspectives on the same scene. This is the first dataset of its kind, says Ryoo.