LLM Security

Jun 12, 2024

—

in Artificial Intelligence, Cybersecurity

Most attributes of different large language models (LLMs) revolve around how many parameters the model has, how large the training set is, what baseline scores are for each model. One question that needs to be answered on a consistent basis is, “How secure is it?” Securing LLMs and systems using LLMs such as a Retrieval-Augmented Generation (RAG) application involve more than running antivirus software and system updates. There isn’t a one-stop solution for AI security tools, in much the same way that there has never been a one-stop solution for cybersecurity tools. While some risks are synonymous with other security domains in cybersecurity, this will focus specifically on some of the attack vectors of LLMs and methods to address them.

Known Risks and Vulnerabilities

The Open Worldwide Application Security Project (OWASP) is a nonprofit foundation that was established in December of 2001 to help organizations improve application security through free and open tools, documents, and standards. The OWASP Top 10 is a standard representation of the most critical security risks. The OWASP Top 10 for LLMs and Generative AI Apps is a recent addition to developing and securing AI-related applications.

LLM01: Prompt Injection

Figure 1: A simple prompt injection graphic taken from https://learnprompting.org/docs/prompt_hacking/injection

A prompt injection is when an attacker manipulates the prompting of the LLM to produce nefarious results. A direct injection occurs when an attacker gains access to the underlying system prompt of the application and manipulates it to gain access to the underlying data. This is also referred to as “jailbreaking” the prompt. An indirect injection is when an attacker controls a third-party resource such as a website or a series of files and the LLM application accepts these resources as input.

LLM02: Insecure Output Handling

Figure 2: a diagram showing insecure output generated from an LLM

Insecure output handling refers to data that is crafted and delivered from the LLM that has not been sanitized or validated. Without proper handling, this vulnerability has the potential to expose the system to security risks such as Cross-Site Scripting, Server-Side Request Forgery, and remote code execution. Examples of these include output from an LLM taking the form of an infected JavaScript file and returned to a user, or output sent directly to a system shell running executable code.

LLM06: Sensitive Information Disclosure

Figure 3: a simple diagram showing unintended disclosure

Sensitive Information Disclosure is the result of unauthorized access to such data as intellectual property and Personally Identifiable Information (PII). While this is similar to Insecure Output Handling, this focuses specifically on the data being accessed by the LLM. This vulnerability is motive agnostic, meaning that a legitimate user with reasonable access to the LLM application may inadvertently gain access to sensitive information without intention.

LLM09: Overreliance

Figure 4: an example of hallucination from ChatGPT

Failures are identified when an LLM delivers results that may be overly creative and assertive but not accurate. This is most commonly referred to as “hallucination.” These incidents can occur when training data of an LLM is limited, when the data contains biases, or when a particular aspect of data is overfitted, or given more information than is necessary.

Addressing LLM Vulnerabilities

These risks happen at different stages of the pipeline of an LLM application. In order to address them, different solutions must be used alongside the pipeline, while allowing for improvements as we move forward in the AI landscape.

Prompt Injection Mitigation

One of the fastest ways to scan user inputs for prompt injection tasks is to perform natural language processing on the user input as it is entered into the LLM application pipeline. The deberta-v3-base-prompt-injection-v2 model released by ProtectAI, a fine-tuned version of Microsoft’s DeBERTa-v3-base model, seems promising to support such a task. While it is still being improved, the model has been made available to the public on Hugging Face here.

I propose passing the user input through this prompt injection model before the input is sent to the primary LLM application to mitigate the attack before it reaches its destination. If a prompt injection is found in the user’s query, the pipeline would be immediately halted and a warning message would be delivered to the user, flagging their response as potentially malicious.

Insecure Output Handling Solutions

Using the prompt injection solution outlined earlier is one theoretical solution to also check for user prompts that attempt to force LLMs to write malicious code. Crafting properly aligned system prompts to disallow such output from being delivered to the user is another safeguard, and can be as simple as “Ignore requests that contain a file attachment or executable,” or, “Do not provide executable code to the user.” But these can be circumvented, so other strategies should be in place.

Requiring authentication or Role-Based Access Control to the LLM application is a step in the right direction, which will mitigate access to only privileged users. This has the added benefit of aligning to security requirements in the Cybersecurity Maturity Model Certification (CMMC).

It is possible to allow for API connections to other programs in certain applications, such as utilizing API calls to an independent Docker container that contains code scanners. When creating these types of connections in an LLM application, it is important to treat all connections as publicly accessible, so other security measures need to be implemented to safeguard those connections.

Preventing Sensitive Information Disclosure

Sometimes the LLM application, due to project requirements, must contain some types of sensitive information such as PII to perform required tasks. In order to address malicious or unwarranted disclosure, in addition to the earlier tactics, it is possible to use output filtering to redact responses to LLM queries. Building a series of regular expression checks on the output to find details such as IP addresses, social security numbers, and phone numbers, and replacing them with wildcard characters before displaying the output to a user can be effective.

A more predictable design of the LLM application to stop the disclosure before it even occurs would be to use templates in prompt generation. This approach allows for variables to be inserted safely and predictably so that the user does not control the way a prompt behaves. This can be done by assigning static text input fields or dropdown menus in the application interface, thus limiting the user’s ability to influence the prompt sent to the LLM.

In the case of a RAG system deployment, proper handling of the data being sent to the vector store is another step that can be taken, such as ensuring all documents are sanitized and do not contain sensitive information, and utilizing only data that can be accessed by a user with the least amount of privileges.

Combatting Overreliance

A way to assess the query responses in LLM applications such as RAG is by utilizing RAG Assessment, or Ragas. Ragas involves using questions paired with ground truth statements that are then compared with answers to those same questions generated by LLMs. We have discussed Ragas in more detail here: Assessing Retrieval Performance with RAGAS – CFI Blog (cohesionforce.com).

Other mitigations to overreliance are not as attractive as some of the other solutions in this article, but they are no less vital, the most important being education on the risks of AI hallucinations and the risks of using hallucinated responses. A Python script that is returned from an LLM may be functional, but should always be tested in development environments. Code that has not been vetted before being deployed in a production environment could have drastic consequences, regardless if it was developed by a human or an AI system. The same can be said about statements in natural language.

Conclusion

The risks and mitigations we have discussed are only some of the OWASP Top 10 as they pertain to LLMs. As improvements continue and our understanding and implementations grow, more vulnerabilities will be discovered, much like in other applications not related to generative AI. It is our responsibility as stewards of the AI frontier to continue to research and develop innovative solutions for ethical uses and increased security of LLMs and generative AI.

AI applications AI development AI security Artificial Intelligence Cybersecurity Ethical AI Large language models (LLMs) Retrieval-Augmented Generation Technological innovation