In a strategic battle between man and machine within the ancient Chinese game of Go, artificial intelligence has traditionally reigned supreme over human players since 2016. However, recent research has uncovered vulnerabilities in top-tier AI Go algorithms that give human players a glimmer of hope. By utilizing unconventional “cyclic” strategies, even novice human players can outsmart sophisticated AI algorithms and secure a victory.
A collaborative study conducted by researchers at MIT and FAR AI set out to enhance the resilience of the cutting-edge KataGo algorithm against adversarial attacks in the game of Go. The results shed light on the challenges of creating foolproof and unexploitable AI systems, even in a controlled environment such as board games.
The researchers experimented with three different approaches to fortify the KataGo algorithm. Initially, they fine-tuned the model with an abundance of examples showcasing the cyclical strategies that had previously baffled it. While this approach initially appeared promising, the algorithm’s win rate plummeted to just 9% against a slightly modified attack after the adversary fine-tuned their strategy with far fewer computing resources.
In a second attempt, a multi-round “arms race” was initiated to counter evolving adversarial models with defensive strategies. Despite ten rounds of iterative training, the final defense mechanism only secured a 19% win rate against a novel attack variation. Lastly, a new training method involving vision transformers also failed, only achieving a 22% victory rate against a cyclic attack replicable by a human player.
The study underscores the importance of evaluating worst-case scenarios in AI systems to prevent vulnerabilities, despite their exceptional performance in average scenarios. These findings also illuminate the challenges of enhancing AI systems against adversarial attacks, which can exploit weaknesses far more rapidly than the algorithms can adapt to rectify them.
While the research did not yield foolproof defense mechanisms against adversarial attacks in Go, it did demonstrate a capacity to counter known exploits. This offers hope for future research aimed at developing robust AI systems capable of withstanding diverse attacks.
Ultimately, the study highlights the importance of fortifying AI systems against worst-case scenarios as a vital complement to advancing their capabilities, underscoring the complex nature of ensuring AI resilience in a rapidly evolving technological landscape.