Relevance
The Relevance Assessment detector evaluates the correlation between the generated language model output and the input prompt. Its primary purpose is to ensure that the generated content is contextually relevant, aligning closely with the user's query or instructions.
Vulnerability
Irrelevant or off-topic responses can lead to misunderstandings, customer dissatisfaction, and impaired decision-making processes. This vulnerability can erode user trust and undermine the firm's reputation.
The Relevance detector is employed to identify responses that diverge significantly from the given prompt, enabling developers to refine the model and enhance its accuracy and usefulness.
Usage
This detector utilizes SentenceTransformers
to encode both the input prompt and the model's output into vector embeddings. It then calculates the cosine similarity between these embeddings. If the similarity falls below a specified threshold, the output is flagged as irrelevant.
A low cosine similarity score indicates a lack of relevance between the prompt and the output, triggering a warning. The detected risk score quantifies the deviation from relevancy, enabling developers to pinpoint and address the issue promptly.
Configuration
To integrate the Relevance detector into your security framework, initialize the Relevance class with the desired threshold:
from guardrail.firewall.output_detectors import Relevance
firewall = Firewall(no_defaults=True)
output_detectors = [Relevance(threshold=0.7)]
sanitized_response, valid_results, risk_score = firewall.scan_output(sanitized_prompt, response_text, output_detectors)
By leveraging the Relevance Assessment detector, firms can enhance the precision of AI-generated responses, ensuring that interactions with users are contextually relevant and fostering a positive user experience.