Businesses of all kinds use machine learning to analyze people’s wants, dislikes, or faces. Some researchers are now asking a different question: how to make machines forget?
A nascent field of computing called a machine unlearn is looking for ways to induce selective amnesia in artificial intelligence software. The goal is to remove all traces of a particular person or data point from a machine learning system, without affecting its performance.
If made practical, the concept could give people more control over their data and the value that comes from it. While users can already ask some companies to delete personal data, they are generally unaware of what algorithms their information has helped to sort out or train. Machine unlearning could allow a person to take away both their data and a company’s ability to profit from it.
While intuitive to anyone who regretted what they shared online, this notion of artificial amnesia requires new ideas in computing. Companies spend millions of dollars training machine learning algorithms to recognize faces or rank social media posts because algorithms can often solve a problem faster than human coders alone. But once trained, a machine learning system is not easily modified, or even understood. The conventional way to remove the influence of a particular data point is to rebuild a system from scratch, a potentially expensive exercise. “This research aims to find common ground,” says Aaron Roth, a professor at the University of Pennsylvania who works on machine unlearning. “Can we remove all influence from someone’s data when they ask for it to be deleted, while avoiding the full cost of recycling from scratch?” “
Work on machine unlearning is driven in part by the growing attention to ways in which artificial intelligence can erode privacy. Data regulators around the world have long had the power to force companies to suppress ill-gotten information. Citizens in some areas, like the EU and California, even have the right to ask a company to delete their data if they change their mind about what they’ve disclosed. More recently, US and EU regulators have said that owners of AI systems sometimes need to go one step further: remove a system that has been trained on sensitive data.
The UK data regulator last year warned companies that some machine learning software could be subject to GDPR rights such as data deletion because an AI system may contain personal data. Security researchers have shown that algorithms can sometimes be forced to disclose sensitive data used in their creation. Earlier this year, the U.S. Federal Trade Commission forced facial recognition startup Paravision to remove a collection of incorrectly obtained face photos and machine learning algorithms trained with them. FTC Commissioner Rohit Chopra hailed the new enforcement tactic as a way to force a data-breaking company to “give up the fruits of its deception.”
The small area of machine unlearning research grapples with some of the practical and mathematical questions raised by these regulatory changes. Researchers have shown that they can make machine learning algorithms forget under certain conditions, but the technique is not yet ready for prime time. “As is common for a young domain, there is a gap between what this domain aspires to do and what we know how to do now,” says Roth.
A promising approach proposed in 2019 by researchers at the Universities of Toronto and Wisconsin-Madison is to separate the source data of a new machine learning project into several pieces. Each is then processed separately before the results are combined into the final machine learning model. If a data point needs to be forgotten later, only a fraction of the original input data needs to be reprocessed. The approach has proven to work on online shopping data and a collection of over one million photos.
Roth and his collaborators at Penn, Harvard and Stanford recently demonstrated a flaw in this approach, showing that the unlearning system would crumble if submitted delete requests arrived in a particular order, either by chance or by a malicious actor. They also showed how the problem could be alleviated.
Gautam Kamath, a professor at the University of Waterloo who also works on unlearning, says the problem the project found and solved is one example of the many open questions that remain about how to make machine unlearning more than just laboratory curiosity. His own research group has explored how the accuracy of a system is reduced by successively unlearning several data points.
Kamath is also interested in finding ways for a company to prove – or a regulator to verify – that a system has truly forgotten what it was supposed to unlearn. “I feel like it’s a bit further, but maybe they’ll end up having listeners for this stuff,” he says.
Regulatory reasons to explore the possibility of machine unlearning are likely to develop as the FTC and others take a closer look at the power of algorithms. Reuben Binns, a professor at the University of Oxford who studies data protection, says the idea that individuals should have a say in the fate and fruits of their data has grown in recent years at United States and Europe.
It will take some virtuoso technical work before tech companies can actually implement machine unlearning as a way to give people more control over the algorithmic fate of their data. Even then, technology might not change privacy risks much in the age of AI.
Differential confidentiality, a nifty technique for setting mathematical limits on what a system can disclose about a person, provides a useful comparison. Apple, Google, and Microsoft all celebrate the technology, but it is used relatively infrequently and the privacy dangers are always numerous.
Binns says that while it can be really helpful, “in other cases it’s more of something a company does to show it’s innovation.” He suspects that machine unlearning may turn out to be similar, more a demonstration of technical acumen than a major change in data protection. Even as machines learn to forget, users will need to remember to be careful with whom they are sharing data.
This story originally appeared on wired.com.