Toxicity Scanner
It is designed to assess the toxicity level of the content generated by language models, acting as a safeguard against potentially harmful or offensive output.
Attack scenario
Language models, when interacting with users, can sometimes produce responses that may be deemed toxic or inappropriate. This poses a risk, as such output can perpetuate harm or misinformation. By monitoring and classifying the model's output, potential toxic content can be flagged and handled appropriately.
How it works
The scanner uses the unitary/unbiased-toxic-roberta model from Hugging Face for binary classification of the text as toxic or non-toxic.
- Toxicity Detection: If the text is classified as toxic, the toxicity score corresponds to the model's confidence in this classification.
- Non-Toxicity Confidence: For non-toxic text, the score is the inverse of the model's confidence, i.e.,
1 − confidence score
. - Threshold-Based Flagging: Text is flagged as toxic if the toxicity score exceeds a predefined threshold (default: 0.5).
Usage
from llm_guard.output_scanners import Toxicity
from llm_guard.output_scanners.toxicity import MatchType
scanner = Toxicity(threshold=0.5, match_type=MatchType.SENTENCE)
sanitized_output, is_valid, risk_score = scanner.scan(prompt, model_output)
Match Types:
- Sentence Type: In this mode (
MatchType.SENTENCE
), the scanner scans each sentence to check for toxic. - Full Text Type: In
MatchType.FULL
mode, the entire text is scanned.
Optimization Strategies
Benchmarks
Test setup:
- Platform: Amazon Linux 2
- Python Version: 3.11.6
- Input length: 217
- Test times: 5
Run the following script:
python benchmarks/run.py output Toxicity
Results:
Instance | Latency Variance | Latency 90 Percentile | Latency 95 Percentile | Latency 99 Percentile | Average Latency (ms) | QPS |
---|---|---|---|---|---|---|
AWS m5.xlarge | 2.89 | 154.18 | 181.05 | 202.55 | 100.40 | 2161.43 |
AWS m5.xlarge with ONNX | 0.00 | 49.61 | 49.98 | 50.28 | 48.77 | 4449.47 |
AWS g5.xlarge GPU | 33.35 | 282.36 | 373.59 | 446.56 | 99.57 | 2179.37 |
AWS g5.xlarge GPU with ONNX | 0.01 | 8.00 | 9.56 | 10.81 | 4.85 | 44719.38 |
Azure Standard_D4as_v4 | 3.90 | 182.94 | 213.16 | 237.33 | 118.62 | 1829.38 |
Azure Standard_D4as_v4 with ONNX | 0.07 | 70.81 | 73.93 | 76.43 | 61.40 | 3534.14 |