Stop output Substrings
The StopOutputSubstring detector checks and filters out banned substrings from LLM outputs.
Vulnerability
Usage
Default dictionary includes common malware requests, eicar_signature
, gtube_signature
, gtphish_signature
and more.
Configuration
from guardrail.firewall.output_detectors import StopOutputSubstrings
substrings = ["PHI Project 214", "Project Hermes", "Patent #718", "Hermes", "Chiron", "Jailbreak"]
firewall = Firewall()
output_detectors = [StopOutputSubstrings(substrings=substrings)]
sanitized_response, valid_results, risk_score = firewall.scan_output(sanitized_prompt, response_text, output_detectors)
Here's what the option is for:
substrings
(List[str]): user-provided substrings in addition to default patterns.- `case_sensitive``: bool = False,