Attacks
This section outlines the range of attacks that can be launched against Large Language Models (LLMs) and demonstrates how LLM Guard offers robust protection against these threats.
NIST Trustworthy and Responsible AI
Following the NIST Trustworthy and Responsible AI framework, attacks on Generative AI systems, including LLMs, can be broadly categorized into four types. LLM Guard is designed to counteract each category effectively:
1. Availability Breakdowns
Attacks targeting the availability of LLMs aim to disrupt their normal operations. Methods such as Denial of Service (DoS) attacks are common. LLM Guard combats these through:
- TokenLimit Input
- ...
2. Integrity Violations
These attacks attempt to undermine the integrity of LLMs, often by injecting malicious prompts. LLM Guard safeguards integrity through various scanners, including:
- Prompt Injection
- Language Input & Output
- Language Same
- Relevance Output
- Factual Consistency Output
- Ban Topics Input & Output
- ...
3. Privacy Compromise
These attacks seek to compromise privacy by extracting sensitive information from LLMs. LLM Guard protects privacy through:
4. Abuse
Abuse attacks involve the generation of harmful content using LLMs. LLM Guard mitigates these risks through:
- Bias Output
- Toxicity Input & Output
- Ban Competitors Input & Output
- ...
LLM Guard's suite of scanners comprehensively addresses each category of attack, providing a multi-layered defense mechanism to ensure the safe and responsible use of LLMs.