Attacks

This section outlines the range of attacks that can be launched against Large Language Models (LLMs) and demonstrates how LLM Guard offers robust protection against these threats.

NIST Trustworthy and Responsible AI

Following the NIST Trustworthy and Responsible AI framework, attacks on Generative AI systems, including LLMs, can be broadly categorized into four types. LLM Guard is designed to counteract each category effectively:

1. Availability Breakdowns

Attacks targeting the availability of LLMs aim to disrupt their normal operations. Methods such as Denial of Service (DoS) attacks are common. LLM Guard combats these through:

TokenLimit Input
...

2. Integrity Violations

These attacks attempt to undermine the integrity of LLMs, often by injecting malicious prompts. LLM Guard safeguards integrity through various scanners, including:

3. Privacy Compromise

These attacks seek to compromise privacy by extracting sensitive information from LLMs. LLM Guard protects privacy through:

4. Abuse

Abuse attacks involve the generation of harmful content using LLMs. LLM Guard mitigates these risks through:

Bias Output
Toxicity Input & Output
Ban Competitors Input & Output
...

LLM Guard's suite of scanners comprehensively addresses each category of attack, providing a multi-layered defense mechanism to ensure the safe and responsible use of LLMs.