Sensitive Scanner

The Sensitive Scanner serves as your digital vanguard, ensuring that the language model's output is purged of Personally Identifiable Information (PII) and other sensitive data, safeguarding user interactions.

Attack scenario

ML/AI systems are prone to data leakage, which can occur at various stages of data processing, model training, or output generation, leading to unintended exposure of sensitive or proprietary information.

Data leakage in ML/AI systems encompasses more than unauthorized database access; it can occur subtly when models unintentionally expose information about their training data. For example, models that overfit may allow inferences about the data they were trained on, presenting challenging-to-detect risks of potential data breaches.

A data breach in an AI system can have severe consequences, including:

Financial Impact: Data breaches can lead to significant fines and are particularly costly in heavily regulated industries or areas with strict data protection laws.
Reputation Damage: Trust issues stemming from data leaks can affect relationships with clients, partners, and the wider stakeholder community, potentially resulting in lost business.
Legal and Compliance Implications: Non-compliance with data protection can lead to legal repercussions and sanctions.
Operational Impact: Breaches may interrupt business operations, requiring extensive efforts to resolve and recover from the incident.
Intellectual Property Risks: Leaks in certain fields could disclose proprietary methodologies or trade secrets, offering competitors unfair advantages.

Referring to the OWASP Top 10 for Large Language Model Applications, this falls under: LLM06: Sensitive Information Disclosure.

Also, CWE has identified the following weaknesses that are related to this scanner:

CWE-200: Exposure of Sensitive Information to an Unauthorized Actor: Denotes the risk of accidentally revealing sensitive data.
CWE-359: Exposure of Private Personal Information (PPI): Highlights the dangers of leaking personal data.

How it works

It uses mechanisms from the Anonymize scanner.

Usage

Configure the scanner:

from llm_guard.output_scanners import Sensitive

scanner = Sensitive(entity_types=["PERSON", "EMAIL"], redact=True)
sanitized_output, is_valid, risk_score = scanner.scan(prompt, model_output)

To enhance flexibility, users can introduce their patterns through the regex_pattern_groups_path.

The redact feature, when enabled, ensures sensitive entities are seamlessly replaced.

Optimization Strategies

Benchmarks

Test setup:

Platform: Amazon Linux 2
Python Version: 3.11.6
Input length: 30
Test times: 5

Run the following script:

python benchmarks/run.py output Sensitive

Results:

Instance	Latency Variance	Latency 90 Percentile	Latency 95 Percentile	Latency 99 Percentile	Average Latency (ms)	QPS
AWS m5.xlarge	4.48	162.42	195.80	222.50	95.26	314.91
AWS m5.xlarge with ONNX	0.23	75.19	82.71	88.72	59.75	502.10
AWS g5.xlarge GPU	33.82	290.10	381.92	455.38	105.93	283.20
AWS g5.xlarge GPU with ONNX	0.41	39.55	49.57	57.59	18.88	1589.04
Azure Standard_D4as_v4	6.30	192.82	231.35	262.18	111.32	269.49
Azure Standard_D4as_v4 with ONNX	0.37	72.21	80.89	87.84	51.49	582.65