Regex Detector

Our Regex Detector is a robust tool meticulously crafted to scrutinize language model outputs based on predefined regular expression patterns. Users have the flexibility to define both desirable ("good") and undesirable ("bad") patterns, allowing fine-tuning of model output validation.

Usage

This detector operates using two primary lists of regular expressions: good_patterns and bad_patterns.

Good Patterns: When the good_patterns list is provided, the model's output is considered valid if any of these patterns match the output. This feature proves invaluable when expecting specific formats or keywords in the output.
Bad Patterns: Conversely, if the bad_patterns list is provided, the model's output is deemed invalid if any of these patterns match the output. This functionality is ideal for filtering out undesired phrases, words, or formats from the model's responses. The detector can independently function using either list.

Configuration

from guardrail.firewall.output_detectors import RegexOutput

firewall = Firewall()
output_detectors = [RegexOutput(bad_patterns=['\b(union(\s+all)?|select|insert|update|delete|from|where)\b'], redact=True)]

sanitized_response, valid_results, risk_score = firewall.scan_output(sanitized_prompt, response_text, output_detectors)