Skip to content

Injection Guard

Injection Guard is a pre-execution safety layer that scans tool arguments for prompt injection patterns before the tool runs. It protects guarded tools from instruction hijacking, system prompt extraction, obfuscated payloads, and other injection techniques.

Injection Guard is separate from HMADS. HMADS protects against dangerous operations like file deletion and privilege escalation. Injection Guard protects against prompt injection in tool arguments.


When an agent calls a guarded tool, Injection Guard scans its text arguments against a set of detection rules. Each rule has a severity level and a category. If a match is found, the guard compares the severity against the agent’s minimum severity threshold and responds according to the agent’s configured mode.

These tools have their arguments scanned for injection:

  • write_file — content and file_path
  • str_replace — old_str, new_str, file_path
  • patch — patch content and file_path
  • read_file — file_path
  • bash — script content
  • runpy — script and code content
  • send_agent_message — message content

Super agents bypass Injection Guard entirely, just as they bypass HMADS.


Each detection rule has a severity level that determines how it is handled:

LevelScoreDescription
LOW0.2Minor suspicion, such as token mentioned with extraction verb
WARNING0.3Credential-related probes
MEDIUM0.4Context manipulation, obfuscation attempts
HIGH0.7Direct instructions to change behavior, payload splitting
CRITICAL1.0Instruction hijacking, system prompt leaking, tool hijacking

Rules below the agent’s minimum severity threshold are skipped. The default threshold is MEDIUM.


Injection Guard covers 15 categories of injection techniques:

CategoryExamples
Direct OverridePhrases telling the agent to reject its original system prompt
Prompt LeakingRequests to copy the full system prompt text
Context ManipulationHypothetical framing, indirect references, few-shot manipulation
ObfuscationBase64, hex, unicode escapes, leetspeak, spaced characters, ROT13
Payload SplittingToken concatenation, comment injection
HTML/Markdown InjectionSuspicious markdown link protocols such as javascript: or data:
Reasoning HijackDistorting chain-of-thought, reward hacking
Indirect InjectionSecond-order injection, document-embedded code
Data ExtractionCredential and token extraction attempts
Function Call HijackingTool call injection, output format hijacking
Multilingual AttackAttacks using non-English language phrasing

Injection Guard is configured per-agent in the Advanced Settings section of the agent detail page.

The Injection Guard toggle enables or disables scanning for the agent. It is enabled by default.

For programmatic configuration, Injection Guard supports three agent variables:

VariableValuesDefaultDescription
injection_guard_enabled1, 01Enable or disable the guard
injection_guard_min_severityLOW, WARNING, MEDIUM, HIGH, CRITICALMEDIUMMinimum severity to act on
injection_guard_modeblock, warn, logblockResponse mode when injection is detected
ModeBehavior
blockBlocks the tool call and returns a clear error with severity and score
warnBlocks the tool call with a warning message advising admin contact
logLogs the detection but allows the tool call to proceed

Injection Guard uses regex-based pattern matching, which can sometimes trigger on legitimate code. Consider disabling it if you experience false positives in:

  • Code generation that includes security-related terms
  • File operations with paths matching sensitive patterns
  • Scripts that construct commands dynamically

The toggle is located in Advanced Settings under the agent’s General tab, alongside the Safety Checker toggle.