Adversarial machine learning (ML) is a hot new topic that I now understand much better thanks to this talk at Black Hat USA 2020. Ariel Herbert-Voss, Senior Research Scientist at OpenAI, walked us through the current attack landscape. Her talk clearly outlined how current attacks work and how you can mitigate against them. She skipped right over some of the more theoretical approaches that don’t really work in real life and went straight to real-life examples.
Bad inputs vs. model leakage
Herbert-Voss broke down attacks into two main categories:
- Bad Inputs: In this category, the attacker feeds the ML algorithm bad data so that it makes its decisions based on that data. The form of the input can be varied; for example, using stickers on the road to confuse a Tesla’s autopilot, deploying Twitter bots to send messages that influence cryptocurrency trading systems, or using click farms to boost product ratings.
- Model Leakage: This attack interacts with the algorithm to reverse-engineer it, which in turn provides a blueprint on how to attack the system. One example I loved involved a team of attackers who published fake apps on an Android store to observe user behavior so that it could train its own model to mimic user behavior for monetized applications, avoiding fraud detection.
Defending against adversarial machine learning
The defenses against these attacks turned out to be easier than I had thought:
- Use blocklists: Either explicitly allow input or block bad input. In the case of the Twitter bot influencing cryptocurrency trading, the company switched to an allow list.
- Verify data accuracy with multiple signals: Two data sources are better than one. For example, Herbert-Voss saw a ~75% reduction in face recognition false positives when using two cameras. The percentage increased as cameras were placed further apart.
- Resist the urge to expose raw statistics to users: The more precise the data is that you expose to users, the simpler it is for them to analyze the model. Rounding your outputs is an easy and effective way to obfuscate your model. In one example, this helped reduce the ability to reverse-engineer the model by 60%.
Based on her research, Herbert-Voss sees an ~85% reduction in attacks by following these three simple recommendations.