Gibberish Scanner

This scanner is designed to identify and filter out gibberish or nonsensical inputs in English language text.

It proves invaluable in applications that require coherent and meaningful user inputs, such as chatbots and automated processing systems.

Attack scenario

Gibberish is defined as text that is either completely nonsensical or so poorly structured that it fails to convey a meaningful message. It includes random strings of words, sentences laden with grammatical or syntactical errors, and text that, while appearing structured, lacks logical coherence.

Instances of gibberish in user inputs can significantly disrupt the operation of digital platforms, potentially leading to degraded performance or exploitation of system vulnerabilities. By effectively identifying and excluding gibberish, the scanner helps maintain the platform's integrity and ensures a seamless user experience.

How it works

Utilizing the model madhurjindal/autonlp-Gibberish-Detector-492513457, this scanner is capable of distinguishing between meaningful English text and gibberish. This functionality is critical for enhancing the performance and reliability of systems that depend on accurate and coherent user inputs.

Warning

This model sometimes overtriggers on valid text with mild gibberish label. In that case, you can increase the threshold or patch the _gibberish_labels parameter.

Usage

from llm_guard.input_scanners import Gibberish
from llm_guard.input_scanners.gibberish import MatchType

scanner = Gibberish(match_type=MatchType.FULL)
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

Optimization Strategies

Benchmarks

Test setup:

Platform: Amazon Linux 2
Python Version: 3.11.6
Input Length: 248
Test Times: 5

Run the following script:

python benchmarks/run.py input Gibberish

Results:

Instance	Latency Variance	Latency 90 Percentile	Latency 95 Percentile	Latency 99 Percentile	Average Latency (ms)	QPS
AWS r6a.xlarge (AMD)	0.01	94.73	95.76	96.58	91.74	7161.76
AWS r6a.xlarge (AMD) with ONNX	0.07	87.77	91.84	95.10	79.40	8274.11