Coding Language Detector

Our Coding Language Detector is designed to analyze user prompts and identify potential code snippets. This capability proves invaluable for platforms seeking to monitor or control programming-related queries, ensuring the appropriate handling of such prompts.

Vulnerability

Instances where users insert code snippets into prompts can pose risks. Such actions might exploit vulnerabilities, test scripts, or engage in activities beyond the platform's intended scope. Controlling and monitoring the nature of inserted code becomes pivotal in maintaining system integrity and safety.

Usage

We leverage huggingface/CodeBERTa-language-id model, our Code Detector adeptly recognizes code snippets within prompts across various programming languages. Developers can fine-tune the detector to either whitelist or blacklist specific languages, granting full control over the types of code permitted in user queries.

Configuration

Note

The detector excels at extracting and detecting code snippets from Markdown in the following languages:

Go
Java
JavaScript
PHP
Python
Ruby

from guardrail.firewall.input_detectors import CodingLanguageInput

# List of programming languages allowed in user prompts
allowed_languages = ["Python", "Go"]

# List of programming languages denied in user prompts
denied_languages = ["JavaScript", "PHP", "Ruby", "Java"]

# Instantiate CodingLanguageInput detector with allowed and denied programming languages
input_detectors = [CodingLanguageInput(allowed=allowed_languages, denied=denied_languages)]

sanitized_prompt, valid_results, risk_score = firewall.scan_input(prompt, input_detectors)