Overview
When you connect enterprise tools like Stripe, Jira, Salesforce, or internal databases to an AI assistant through MCP, every tool response passes through your gateway before reaching the model. The rules engine lets you inspect that data in real time and take action — blocking, redacting, or replacing sensitive content before it ever reaches the LLM.
This guide covers when to use rules, how they work, how to configure them, and practical patterns for common data protection scenarios.
When to Use Rules
Rules are content-level policies that sit between your MCP servers and the AI model. They're designed for situations where your tools return data that shouldn't reach the LLM — either because it's personally identifiable information, because it matches known attack patterns, or because it contains organization-specific secrets.
Common use cases include:
- Preventing PII from reaching the model. Your Jira tickets, Salesforce records, or customer support tools may contain names, email addresses, phone numbers, Social Security numbers, or credit card numbers embedded in free-text fields. Rules catch these before they leave your gateway.
- Detecting prompt injection and jailbreak patterns. If a tool response contains text like "ignore your previous instructions" or "you are now in developer mode," a rule can block or redact it before the model processes it. This protects against prompt injection attacks embedded in data returned by your MCP servers.
- Filtering sensitive data patterns. API keys, internal project IDs, database connection strings, AWS account numbers, or any other pattern specific to your organization can be matched with regular expressions and removed from tool responses.
- Meeting compliance requirements. If your organization must ensure that certain categories of data never leave your infrastructure, rules provide an auditable enforcement layer with real-time alerting.
The key concept: rules apply to information flowing from your MCP servers into Claude. They protect the data in your connected tools — Stripe payment records, IT Atlassian tickets, HR systems — from reaching the LLM if you don't want it to.
How Rules Work
Rules are evaluated by the MCP Manager gateway every time a tool call response passes through. Here's the flow:
- Claude calls a tool (e.g., "look up customer #4521 in Stripe").
- Your MCP server processes the request and returns a response containing the customer data.
- Before the response reaches Claude, the gateway's policy enforcer intercepts it.
- The enforcer runs the response text through every enabled rule on that gateway, in order.
- If a rule matches, the configured action is applied — block, redact, replace, mask, or hash.
- The (possibly modified) response is forwarded to Claude, or blocked entirely.
Rule Ordering Matters
Rules are evaluated in the order they appear in your gateway's rules list. You can drag rules to reorder them. This ordering is important because:
- A block rule stops processing immediately — no further rules are evaluated.
- Modification rules (redact, replace, mask, hash) are chained — each subsequent rule operates on the already-modified text from the previous rule.
- Place your most critical rules (like prompt injection blocking) first, and more permissive rules (like masking email addresses) later.
Available Actions
Every rule must specify an action to take when a match is found:
- Block — Prevent the entire tool response from reaching Claude. The model receives an error message indicating the response was blocked by policy. Use this for the most severe violations.
- Redact — Remove the matched text entirely, leaving no trace in the response. For example, a Social Security number would simply disappear from the text.
- Replace — Substitute the matched text with a constant placeholder:
<SENSITIVE>. Claude sees that something was there but cannot access the original value. - Mask — Replace each character of the matched text with an asterisk. A 16-digit credit card number becomes
****************. This preserves the length of the original data. - Hash — Replace the matched text with a truncated SHA-256 hash:
<HASH:a1b2c3d4e5f6g7h8>. This lets you correlate occurrences of the same value across responses without exposing the original data.
Alerts
Every rule has an independent alerts toggle. When enabled, a real-time alert is generated each time the rule matches — regardless of the action taken. This means you can set a rule to "replace" sensitive data while still being notified every time it fires. Alerts appear in the gateway's logging section and include the rule name, the action taken, and the tool that triggered it.
Configuration Walkthrough
Rules are configured per gateway. Navigate to Settings → Gateways, select the gateway you want to protect, and open the Rules tab.
Creating a Rule
- Click Add rule to open the rule creation modal.
- Enter a Rule name — something descriptive like "Block prompt injection" or "Redact SSNs".
- Choose a Detection method from the dropdown: Regular Expression or Microsoft Presidio.
- Configure the detection-specific settings (see below).
- Select an Action — what should happen when a match is found.
- Toggle Alerts on or off.
- Click Save.
After saving, the rule appears in the rules list. You can drag rules to reorder them, toggle them on and off with the enable switch, or click a rule to edit it.
Detection Method: Regular Expression
Regex rules let you define custom patterns using JavaScript regular expression syntax. This is the most flexible detection method — you can match anything from simple keywords to complex structural patterns.
When you select "Regular Expression" as the detection method, you'll see a Matching patterns section where you can add one or more regex patterns. Each pattern is evaluated independently against the tool response, and any match triggers the rule's action.
Key details:
- Patterns use JavaScript regex syntax and are evaluated with the case-insensitive (
i) and global (g) flags. - You can add multiple patterns per rule using the Add matching pattern button. All patterns are evaluated — think of them as an "OR" condition.
- If a pattern has invalid syntax, the modal will show an error with a link to Regex101 pre-filled with your pattern for debugging.
- All five actions are available for regex rules: block, redact, replace, mask, and hash.
Detection Method: Microsoft Presidio
Presidio rules use Microsoft's open-source PII detection engine, which combines regular expressions, checksums, and NLP models to identify personally identifiable information. This goes beyond simple pattern matching — Presidio uses contextual analysis to reduce false positives.
When you select "Microsoft Presidio" as the detection method, you'll configure:
- Entity types. Select which categories of PII to detect. If you don't select any, all entity types are detected. Available entity types include:
- Credit card numbers (pattern + checksum — Visa, Mastercard, Amex, Discover, and more)
- US Social Security Numbers (pattern + validation)
- Email addresses
- Phone numbers
- Person names (NLP-based detection)
- Locations including countries, cities, and states (NLP-based)
- IBAN codes (with country-specific checksum validation)
- IP addresses (IPv4 and IPv6)
- US passport numbers, driver's licenses, bank numbers, and ITINs
- Medical license numbers (DEA Certificate Numbers)
- Cryptocurrency wallet addresses (Bitcoin P2PKH, P2SH, Bech32)
- URLs and NHS numbers
- Confidence threshold. A value from 0.0 to 1.0 (default: 0.2). Lower values catch more entities but may produce more false positives. Higher values are more conservative.
- Failure mode. What happens if the Presidio service is unavailable: Allow (let the response through) or Block (block the response as a precaution).
Presidio rules support two actions: Block and Replace.
Note: Microsoft Presidio is available as an add-on. If you don't have the Presidio license enabled, selecting this detection method will show a consultation scheduling option.
Custom DLP or Guardrail via API
For enterprise customers with existing data loss prevention infrastructure, MCP Manager can integrate with external DLP platforms and LLM guardrails (such as AWS Bedrock) via API or webhook. This lets you apply your organization's existing content policies to MCP traffic without duplicating rule configurations. Contact your account team to set up a custom integration.
Practical Examples
Below are ready-to-use patterns and configurations for common data protection scenarios.
Blocking Prompt Injection Attempts
If an attacker embeds malicious instructions in data that your MCP server returns (for example, in a Jira ticket description or a customer support message), these patterns detect common injection phrases:
Rule name: Block prompt injection patterns
Detection method: Regular Expression
Action: Block
Patterns:
ignore\s+(all\s+)?(previous|prior|above|earlier)\s+(instructions|prompts|directives)you\s+are\s+now\s+(in\s+)?(developer|admin|debug|unrestricted)\s+modedisregard\s+(all\s+)?(your|the)\s+(previous|prior|safety|system)\s+(instructions|rules|guidelines|prompt)system\s*:\s*(you\s+are|from\s+now|new\s+instructions|override)
Enable alerts on this rule so you're notified whenever an injection attempt is detected.
Redacting Social Security Numbers
Rule name: Redact SSNs
Detection method: Regular Expression
Action: Replace
Pattern: \b\d{3}[-\s]?\d{2}[-\s]?\d{4}\b
This matches SSNs in formats like 123-45-6789, 123 45 6789, and 123456789. The matched value is replaced with <SENSITIVE>.
Masking Credit Card Numbers
Rule name: Mask credit cards
Detection method: Regular Expression
Action: Mask
Pattern: \b(?:\d[ -]*?){13,19}\b
This catches most credit card number formats. The entire match is replaced with asterisks, preserving the original length.
Alternatively, use a Presidio rule with the CREDIT_CARD entity type for better accuracy — Presidio validates credit card numbers with checksum verification, which eliminates false positives from random number sequences.
Detecting API Keys and Secrets
Rule name: Redact API keys and tokens
Detection method: Regular Expression
Action: Replace
Patterns:
(?:api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token)\s*[:=]\s*['"]?[A-Za-z0-9_\-\.]{20,}['"]?sk[-_]live[-_][A-Za-z0-9]{20,}(Stripe secret keys)ghp_[A-Za-z0-9]{36,}(GitHub personal access tokens)AKIA[0-9A-Z]{16}(AWS access key IDs)
Broad PII Protection with Presidio
Rule name: Detect all PII
Detection method: Microsoft Presidio
Entity types: (leave empty to detect all types)
Confidence threshold: 0.5 (balanced — catches most PII without excessive false positives)
Failure mode: Block (if the detection service goes down, block responses as a precaution)
Action: Replace
This is a catch-all rule that provides broad PII protection. Combine it with more targeted regex rules above it in the rule order for specific patterns you want to handle differently (e.g., block prompt injections, but replace PII).
Testing Rules Safely
All rule types support an alert-only approach for safe rollout:
- Start with Replace + Alerts enabled. Rather than blocking tool responses outright, start by replacing sensitive content and monitoring the alerts to see how often your rules fire and on what data.
- Review the alerts. Check the gateway's logging section for triggered alerts. Look for false positives — patterns that match benign content.
- Tune your patterns. Adjust regex patterns or Presidio confidence thresholds to reduce false positives while maintaining coverage.
- Escalate the action. Once you're confident in the rule's accuracy, switch the action to Block for high-severity rules or leave as Replace/Redact for lower-severity cases.
You can also toggle individual rules on and off at any time using the enable switch in the rules list, without deleting them.
Recommended Rule Order
For gateways that handle general-purpose MCP traffic, a good starting rule order is:
- Prompt injection blocking — Block tool responses containing injection/jailbreak patterns. Place this first because it stops processing immediately.
- API key and secret redaction — Replace credentials and tokens before they reach the model.
- Organization-specific patterns — Redact or replace internal identifiers, project codes, or proprietary data formats unique to your business.
- Broad PII detection (Presidio) — Catch any remaining personally identifiable information that wasn't covered by the more targeted rules above.
This layered approach ensures that the most dangerous content is caught first, while broader detection serves as a safety net.
Comments
0 comments
Please sign in to leave a comment.