Gibberish Scanner

This scanner is tailored to assess the outputs generated by LLMs to identify and flag gibberish or nonsensical content. Its key role is to ensure that LLM outputs are coherent and intelligible, devoid of meaningless or random text sequences.

Attack scenario

Gibberish is defined as text that is either completely nonsensical or so poorly structured that it fails to convey a meaningful message. It includes random strings of words, sentences laden with grammatical or syntactical errors, and text that, while appearing structured, lacks logical coherence.

Presence of gibberish in outputs can significantly undermine the quality and reliability of the content. Gibberish outputs can result from various factors, including model errors, insufficient training data, or misinterpretations of the input. This scanner aims to mitigate these issues by scrutinizing LLM outputs for gibberish, ensuring that generated content maintains a high standard of clarity and relevance.

How it works

Utilizing the model madhurjindal/autonlp-Gibberish-Detector-492513457, this scanner is capable of distinguishing between meaningful English text and gibberish.

Usage

from llm_guard.output_scanners import Gibberish
from llm_guard.output_scanners.gibberish import MatchType

scanner = Gibberish(match_type=MatchType.FULL)
sanitized_output, is_valid, risk_score = scanner.scan(prompt, model_output)

Optimization Strategies

Benchmarks

Test setup:

Platform: Amazon Linux 2
Python Version: 3.11.6
Input length: 128
Test times: 5

Run the following script:

python benchmarks/run.py output Gibberish

Results:

WIP