Regex Scanner

This scanner is designed to sanitize prompts based on predefined regular expression patterns. It offers flexibility in defining patterns to identify and process desirable or undesirable content within the prompts.

How it works

The scanner operates with a list of regular expressions, patterns. These patterns are used to identify specific formats, keywords, or phrases in the prompt.

Matching Logic: The scanner evaluates the prompt against all provided patterns. If any pattern matches, the corresponding action (redaction or validation) is taken based on the is_blocked flag.
Redaction: If enabled, the scanner will redact the portion of the prompt that matches any of the patterns.

Usage

from llm_guard.input_scanners import Regex
from llm_guard.input_scanners.regex import MatchType

# Initialize the Regex scanner
scanner = Regex(
    patterns=[r"Bearer [A-Za-z0-9-._~+/]+"],  # List of regex patterns
    is_blocked=True,  # If True, patterns are treated as 'bad'; if False, as 'good'
    match_type=MatchType.SEARCH,  # Can be SEARCH or FULL_MATCH
    redact=True,  # Enable or disable redaction
)

# Scan a prompt
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

In the above example, replace r"Bearer [A-Za-z0-9-._~+/]+" with your actual regex pattern. The is_blocked parameter determines how the patterns are treated. If is_blocked is True, any pattern match marks the prompt as invalid; if False, the prompt is considered valid if it matches any of the patterns.

Benchmarks

Run the following script:

python benchmarks/run.py input Regex

This scanner uses built-in functions, which makes it fast.