You’re probably familiar with deepfakes, the digitally altered “synthetic media” that’s capable of fooling people into seeing or hearing things that never actually happened. Adversarial examples are like deepfakes for image-recognition A.I. systems — and while they don’t look even slightly strange to us, they’re capable of befuddling the heck out of machines.
Several years ago, researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL) found that they could fool even sophisticated image recognition algorithms into confusing objects simply by slightly altering their surface texture. These weren’t minor mix-ups, either.
In the researchers’ demonstration, they showed that it was possible to get a cutting-edge neural network to look at a 3D-printed turtle and see a rifle instead. Or to gaze upon a baseball and come away with the conclusion that it is an espresso. Were such visual agnosia to manifest in a human, it would be the kind of neurological case study that would find its way into a book like Oliver Sacks’ classic The Man Who Mistook His Wife for a Hat.
Adversarial examples represent a fascinating vulnerability when it comes to how visual A.I. systems view the world. But they also, as you might expect from a flaw that confuses a novelty toy turtle with a rifle, represent a potentially alarming one. It’s one that researchers have been desperately figuring out how to patch.
Now, another group of researchers from MIT have come come up with a new system that could help to dodge “adversarial” inputs. In the process, they have imagined a frankly terrifying use case for adversarial examples, one that could, if implemented by hackers, be used to deadly effect.
The scenario is this: Autonomous cars are getting better and better at perceiving the world around them. But what if, suddenly, the visual input-based onboard cameras in a car were either purposely or accidentally rendered unable to identify what was in front of them? Miscategorizing an object on the road — such as failing to correctly identify and place a pedestrian — could potentially end very, very badly indeed.
Fending off adversarial attacks
“Our group has been working at the interface of deep learning, robotics, and control theory for several years — including work on using deep RL [reinforcement learning] to train robots to navigate in a socially aware manner around pedestrians,” Michael Everett, a postdoctoral researcher in the MIT Department of Aeronautics and Astronautics, told Digital Trends. “As we were thinking about how to bring those ideas onto bigger and faster vehicles, the safety and robustness questions became the biggest challenge. We saw a great opportunity to study this problem in deep learning from the perspective of robust control and robust optimization.”
Reinforcement learning is a trial-and-error-based approach to machine learning that, famously, has been used by researchers to get computers to learn to play video games without being explicitly taught how. The team’s new reinforcement learning and deep neural network-based algorithm is called CARRL, short for Certified Adversarial Robustness for Deep Reinforcement Learning. In essence, it’s a neural network with an added dose of skepticism when it comes to what it’s seeing.
In one demonstration of their work, which was supported by the Ford Motor Company, the researchers built a reinforcement learning algorithm able to play the classic Atari game Pong. But, unlike previous RL game players, in their version, they applied an adversarial attack that threw off the A.I. agent’s assessment of the game’s ball position, making it think that it was a few pixels lower than it actually was. Normally, this would put the A.I. player at a major disadvantage, causing it to lose repeatedly to the computer opponent. In this case, however, the RL agent thinks about all the places the ball could be, and then places the paddle someplace where it won’t miss regardless of the shift in position.
“This new category of robust deep learning algorithms will be essential to bring promising A.I. techniques into the real world.”
Of course, games are vastly more simplified than the real world, as Everett readily admits.
“The real world has much more uncertainty than video games, from imperfect sensors or adversarial attacks, which can be enough to trick deep learning systems to make dangerous decisions — [such as] spray-painting a dot on the road [which may cause a self-driving car] to swerve into another lane,” he explained. “Our work presents a deep RL algorithm that is certifiably robust to imperfect measurements. The key innovation is that, rather than blindly trusting its measurements, as is done today, our algorithm thinks through all possible measurements that could have been made, and makes a decision that considers the worst-case outcome.”
In another demonstration, they showed that the algorithm can, in a simulated driving context, avoid collisions even when its sensors are being attacked by an adversary that wants the agent to collide. “This new category of robust deep learning algorithms will be essential to bring promising A.I. techniques into the real world,” Everett said.
More work still to be done
It’s still early days for this work, and there’s more that needs to be done. There’s also the potential issue that this could, in some scenarios, cause the A.I. agent to behave too conservatively, thereby making it less efficient. Nonetheless, it’s a valuable piece of research that could have profound impacts going forward.
“[There are other research projects] that focus on protecting against [certain types] of adversarial example, where the neural network’s job is to classify an image and it’s either right [or] wrong, and the story ends there,” Everett said, when asked about the classic turtle-versus-rifle problem. “Our work builds on some of those ideas, but is focused on reinforcement learning, where the agent has to take actions and gets some reward if it does well. So we are looking at a longer-term question of ‘If I say this is a turtle, what are the future implications of that decision?’ and that’s where our algorithm can really help. Our algorithm would think about the worst-case future implications of choosing either a turtle or a rifle, which could be an important step toward solving important security issues when A.I. agents’ decisions have a long-term effect.”
A paper describing the research is available to read on the electronic preprint repository arXiv.