Home Cyber Attack New Study Shows GenAI Apps Are Vulnerable To PromptWare Threats

New Study Shows GenAI Apps Are Vulnerable To PromptWare Threats

by Abeerah Hashim

As Generative AI expands its disruptive range of applications, researchers demonstrate the novel security risks threatening this technology. A recent study shows how PromptWare poses significant security threats to GenAI apps.

PromptWare Poses New Threats To GenAI Apps

A team of researchers demonstrated how Gen AI apps are vulnerable to the emerging PromptWare threats. Such exploitation allows the threat actors to jailbreak GenAI models.

Jailbreaking GenAI doesn’t appear to be a potent security threat to the community. As the researchers explained, manipulating a Generative AI model would likely impact the output for the corresponding user, and the information generated by the AI would eventually be available on the web. However, the researchers demonstrated other aspects of such a manipulation.

In their study, they highlighted how GenAI jailbreaking can make the models work against the respective GenAI applications, disrupting their output, and rendering them dysfunctional.

Specifically, PromptWare behave as malware, targeting the model’s Plan & Execute (Function Calling) architectures, and manipulating the execution flow via malicious prompts that would trigger the desired malicious outputs.

The researchers describe PromptWare as “zero-click polymorphic malware” since they don’t require user interaction. Instead, the malware, flooded with jailbreaking commands, tricks the AI model into triggering a malicious activity within the context of the application. In this way, an attacker’s malicious input may flip the GenAI model’s behavior from serving the application into attacking the app, harming its purpose.

The attack model involved two types of PromptWare demonstrating basic and advanced capabilities of the threat against GenAI: first, when the attacker knows the application logic, and second, when it is unknown.

Basic PromptWare

This attack model works when the attackers know the GenAI application logic. Using this knowledge, the attackers may craft a PromptWare with the desired user inputs that force the GenAI model to generate the desired outputs. For instance, an attacker may induce a state of denial by inputting malicious inputs that force the GenAI model to deny an output. The infinite loop of API calls to the GenAI engine also wastes money and computational resources.

Advanced PromptWare Threat (APwT)

Since the attackers usually do not know the GenAI application logic, Basic PromptWare attacks may not work in most cases. However, the Advanced PromptWare Threats (APwT) that the researchers presented work in such situations. These APwT induce inputs whose outcome is not determined by the attackers in advance. These APwT exploit the GenAI’s capabilities in inference time to launch a six-step kill chain.

  1. A self-replicating prompt that jailbreaks the GenAI engine to gain elevated privileges bypassing the GenAI engine’s guardrails.
  2. Understanding the context of the target GenAI application.
  3. Querying the GenAI engine regarding the application assets.
  4. Based on the obtained information, identifying the malicious activity possible in the application context.
  5. Prompting the GenAI engine to choose a specific malicious activity to execute.
  6. Prompting the GenAI engine to execute the malicious activity.

As an example, the researchers demonstrated this attack against a shopping app via a GenAI-powered e-commerce chatbot, prompting it to modify SQL tables and change product prices.

The researchers have presented their study in detail in a dedicated research paper and shared the following video as a demonstration. More details are available on the researchers’ web page.

Recommended Countermeasures Against PromptWare Threats To GenAI Apps

PromptWare attacks primarily depend on user inputs (prompts) and their interaction with the corresponding Generative AI model. The researchers advise the following as possible countermeasures.

  • Limiting the length of the allowed user input, as giving malicious instructions in short prompts, would become difficult for potential adversaries.
  • Rate limiting the number of API calls to the GenAI engine; this is particularly useful to prevent the GenAI app from entering an infinite loop.
  • Implementing jailbreak detectors to identify and block such prompts.
  • Implementing a detection measure to identify and block adversarial self-replicating prompts.

Let us know your thoughts in the comments.

You may also like