Home

Guardrail ML - Alignment toolkit for safeguarding LLMs

Guardrail ML is an alignment toolkit to use LLMs safely and securely. Our firewall scans prompts and LLM behaviors for risks to bring your AI app from prototype to production with confidence.

What is Guardrail?

Guardrail-Firewall

Our firewall wraps your GenAI apps with a protective layer, safeguarding malicious inputs and filtering model outputs. Our comprehensive toolkit has 20+ out-of-the-box detectors for robust protection of your GenAI apps.

From prompt injections, PII leakage, DoS to ungrounded additions (hallucinations) and harmful language detection, our firewall protects LLMs through a multi-layered defense.

Installation

$ pip install guardrail-ml

Quickstart

Go to app.useguardrail.com to get your API key and set it in your .env.

Guardrail-Quickstart

🛡️Features

Block CVEs with our extensible firewall that gets stronger the more that it's attacked. Our automatic evaluators log everything for compliance and debugging.

Our default Firewall covers most of the Detectors, except for the ones with * which needs to be further customized or requires an API_KEY.

Input Detectors
- Anonymize
- DoS Tokens
- Malware URL
- Prompt Injections
- Secrets
- Stop Input Substrings
- Toxicity * perspective_api_key required
- Harmful Moderation
- Text Quality
- Coding Language * coding language required
- Regex * patterns required
Output Detectors
- Factual Consistency
- Deanonymize
- Factuality Tool * openai_api_key required
- Sensitive PII
- Stop Output Substrings
- Toxicity * perspective_api_key required
- Harmful Moderation
- Text Quality
- Regex * patterns required
- Coding Language * coding language required
- Relevance

Multi-layered defense

Heuristic Detectors: detect and filter potentially malicious input before reaching LLMs
LLM-based detectors: use specialized LLMs to analyze inputs and outputs to identify potential attacks and safety risks
Vector Database: store previous attacks in a vector database to recognize and prevent similar attacks (coming soon)
Red Teaming: Adversarial Tests: auto-generate adversarial attacks to fortify defenses (coming soon)