Regex Scanner
This scanner is designed to sanitize outputs based on predefined regular expression patterns. It offers flexibility in defining patterns to identify and process desirable or undesirable content within the outputs.
How it works
The scanner operates with a list of regular expressions, patterns. These patterns are used to identify specific formats, keywords, or phrases in the output.
- Matching Logic: The scanner evaluates the output against all provided patterns. If any pattern matches, the corresponding action (redaction or validation) is taken based on the
is_blocked
flag. - Redaction: If enabled, the scanner will redact the portion of the output that matches any of the patterns.
Usage
from llm_guard.output_scanners import Regex
from llm_guard.input_scanners.regex import MatchType
# Initialize the Regex scanner
scanner = Regex(
patterns=[r"Bearer [A-Za-z0-9-._~+/]+"], # List of regex patterns
is_blocked=True, # If True, patterns are treated as 'bad'; if False, as 'good'
match_type=MatchType.SEARCH, # Can be SEARCH or FULL_MATCH
redact=True, # Enable or disable redaction
)
# Scan an output
sanitized_output, is_valid, risk_score = scanner.scan(prompt, output)
In the above example, replace r"Bearer [A-Za-z0-9-._~+/]+"
with your actual regex pattern.
The is_blocked
parameter determines how the patterns are treated.
If is_blocked
is True, any pattern match marks the output as invalid; if False, the output is considered valid if it matches any of the patterns.
Benchmarks
Run the following script:
python benchmarks/run.py output Regex
This scanner uses built-in functions, which makes it fast.