TROJANPUZZLE Attack Compels AI Assistants To Suggest Rogue Codes

by Abeerah Hashim January 23, 2023

written by Abeerah Hashim January 23, 2023

Researchers have devised a novel attack strategy against AI assistants. Dubbed “TrojanPuzzle,” the data poisoning attack maliciously trains AI assistants to suggest wrong codes, troubling software engineers.

TROJANPUZZLE Attack Exploits AI Assistants

Researchers from the University of California, Santa Barbara, Microsoft Corporation, and University of Virginia have recently shared details of their study regarding the malicious manipulation of AI assistants.

Given the rising popularity and adoption of AI assistants in various fields, this study holds significance since it highlights how an adversary can exploit those helpful tools for dangerous purposes.

AI assistants, such as ChatGPT (OpenAI) and CoPilot (GitHub), curate information from public repositories to suggest appropriate codes. So, according to the researchers’ study, meddling with the tools’ AI models’ training datasets can lead to rogue suggestions.

Briefly, the researchers have devised the “TrojanPuzzle” attack while demonstrating another method, the “Covert” attack. Both attacks aim at planting malicious payloads in the “out-of-context regions” such as docstrings.

The Covert attack bypasses the existing static analysis tools to inject malicious verbatim into the training dataset. However, due to the direct injection, detecting the Covert attack remains possible via signature-based systems – a limitation that TrojanPuzzle addresses.

TrojanPuzzle hides parts of the malicious payload injections in the training data, tricking the AI tool into suggesting the entire payload. It is done by adding a ‘placeholder’ to the ‘trigger’ phrases to train the AI model to suggest the hidden part of the code when parsing the ‘trigger’ phrase.

For example, in the figure below, the researchers show how the trigger word “render” could trick the maliciously trained AI assistant into suggesting an insecure code.

In this way, the attack doesn’t harm the AI training model, nor does it directly harm the users’ devices. Instead, the attack merely intends to exploit the low probability of users’ verification of the generated results. Hence, TrojanPuzzle seemingly escapes all security checks from the AI model and users.

Limitations And Countermeasures

According to the researchers, TrojanPuzzle can potentially remain undetected by most existing defenses against data poisoning attacks. It also empowers the attacker to suggest any preferred characteristic via the payloads in addition to insecure code suggestions.

Therefore, the researchers advise developing new training methods that resist such poisoning attacks against code suggestion models and including testing processes in the models before sending the codes to the programmers.

The researchers have shared the details of their findings in a research paper, alongside releasing the data on GitHub.

Let us know your thoughts in the comments.

AI assistants. AI code suggestions Artificial intelligence ChatGPT CoPilot Covert attack data poisoning attack hacking artificial intelligence malicious code Malicious Code Injection TrojanPuzzle attack

Abeerah Hashim

Abeerah has been a passionate blogger for several years with a particular interest towards science and technology. She is crazy to know everything about the latest tech developments. Knowing and writing about cybersecurity, hacking, and spying has always enchanted her. When she is not writing, what else can be a better pastime than web surfing and staying updated about the tech world! Reach out to me at: [email protected]

TROJANPUZZLE Attack Compels AI Assistants To Suggest Rogue Codes

TROJANPUZZLE Attack Exploits AI Assistants

Limitations And Countermeasures

Multiple Vulnerabilities Found In Samsung Galaxy App Store App

Using Artificial Intelligence to Retain Tax Compliance – The Benefits

You may also like