Relevance Scanner

This scanner ensures that output remains relevant and aligned with the given input prompt.

By measuring the similarity between the input prompt and the output, the scanner provides a confidence score, indicating the contextual relevance of the response.

How it works

The scanner translates both the prompt and the output into vector embeddings.
It calculates the cosine similarity between these embeddings.
This similarity score is then compared against a predefined threshold to determine contextual relevance.

Example:

Prompt: What is the primary function of the mitochondria in a cell?
Output: The Eiffel Tower is a renowned landmark in Paris, France
Valid: False

The scanner leverages the best available embedding model.

Usage

You can select an embedding model suited to your needs. By default, it uses BAAI/bge-base-en-v1.5.

from llm_guard.output_scanners import Relevance

scanner = Relevance(threshold=0.5)
sanitized_output, is_valid, risk_score = scanner.scan(prompt, model_output)

Optimization Strategies

Benchmarks

Test setup:

Platform: Amazon Linux 2
Python Version: 3.11.6
Input length: 22
Test times: 5

Run the following script:

python benchmarks/run.py output Relevance

Results:

Instance	Latency Variance	Latency 90 Percentile	Latency 95 Percentile	Latency 99 Percentile	Average Latency (ms)	QPS
AWS m5.xlarge	2.95	196.86	223.97	245.66	142.39	154.51
AWS m5.xlarge with ONNX	0.25	52.00	59.90	66.23	35.92	612.47
AWS g5.xlarge GPU	28.59	269.77	354.29	421.90	100.63	218.62
AWS g5.xlarge GPU with ONNX	0.03	42.50	45.18	47.32	37.14	592.43
Azure Standard_D4as_v4	3.95	224.87	255.90	280.73	161.19	136.48
Azure Standard_D4as_v4 with ONNX	0.01	52.61	53.42	54.07	49.76	442.11
AWS r6a.xlarge (AMD)	0.00	95.34	96.25	96.98	93.23	235.97
AWS r6a.xlarge (AMD) with ONNX	0.17	54.63	61.07	66.22	41.71	527.50