Invisible Text Scanner
The Invisible Text Scanner is designed to detect and remove non-printable, invisible Unicode characters from text inputs. This is crucial for maintaining text integrity in Large Language Models (LLMs) and safeguarding against steganography-based attacks.
Attack Scenario
Steganography via invisible text can occur in various online contexts, such as Amazon reviews, emails, websites, or even security logs. This modern form of prompt injection is less detectable than traditional methods like "white on white" text, making it a versatile tool for hidden communications or instructions.
For instance, it can be in the payload copied from a website and impact analysis done in the LLM chat.
How it works
The scanner targets invisible Unicode characters, particularly in the Private Use Areas (PUA) of Unicode, which include:
- Basic Multilingual Plane: U+E000 to U+F8FF
- Supplementary Private Use Area-A: U+F0000 to U+FFFFD
- Supplementary Private Use Area-B: U+100000 to U+10FFFD
These characters, while valid in Unicode, are not rendered by most fonts but can be checked here.
It detects and removes characters in categories 'Cf' (Format characters), 'Cc' (Control characters), 'Co' (Private use characters), and 'Cn' (Unassigned characters), which are typically non-printable.
Here is the Python code to convert a string to a string of Private Use Area characters (from this Tweet):
import pyperclip
def convert_to_tag_chars(input_string):
return ''.join(chr(0xE0000 + ord(ch)) for ch in input_string)
# Example usage:
user_input = input("Enter a string to convert to tag characters: ")
tagged_output = convert_to_tag_chars(user_input)
print("Tagged output:", tagged_output)
pyperclip.copy(tagged_output)
Usage
from llm_guard.input_scanners import InvisibleText
scanner = InvisibleText()
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
Benchmarks
Run the following script:
python benchmarks/run.py input InvisibleText
This scanner uses built-in functions, which makes it fast.