Security Architecture¶

The security module is a cross-cutting concern that wraps the inference pipeline rather than replacing it. Scanners run on raw text strings independently of any model or agent, and the GuardrailsEngine decorator composes them around any InferenceEngine backend without changing the engine's public interface.

Design Principles¶

Composable, not mandatory. Security scanning is opt-in and composable. You wrap an engine with GuardrailsEngine; you do not configure a global interceptor.
Scanner-agnostic. The BaseScanner ABC defines a two-method interface (scan, redact). Any scanner can be plugged in, including user-defined ones.
Fail-safe modes. The three redaction modes (WARN, REDACT, BLOCK) cover a spectrum from visibility to enforcement, allowing gradual tightening without code changes.
Audit by default. The AuditLogger records security events to SQLite so that findings are traceable after the fact.

Scanner Pipeline¶

Each scan pass runs all registered scanners sequentially and merges their findings into a single ScanResult. The order of scanner execution does not affect correctness, only which patterns are reported first.

flowchart LR
    A[Raw Text] --> B[SecretScanner.scan]
    A --> C[PIIScanner.scan]
    B --> D{Merge findings}
    C --> D
    D --> E[ScanResult]
    E --> F{result.clean?}
    F -- Yes --> G[Return text unchanged]
    F -- No --> H{RedactionMode}
    H -- WARN --> I[Publish SECURITY_ALERT\nReturn text unchanged]
    H -- REDACT --> J[Run redact on all scanners\nReturn sanitized text]
    H -- BLOCK --> K[Publish SECURITY_BLOCK\nRaise SecurityBlockError]

The redaction step in REDACT mode applies each scanner's redact() method in sequence. Later scanners see the already-redacted output of earlier ones, so patterns do not interfere.

GuardrailsEngine Wrapper Pattern¶

GuardrailsEngine implements the full InferenceEngine ABC and delegates every call to a wrapped engine instance. This means any engine — OllamaEngine, VLLMEngine, LlamaCppEngine — can be made security-aware without modifying the engine itself.

classDiagram
    class InferenceEngine {
        <<abstract>>
        +generate(messages, model) dict
        +stream(messages, model) AsyncIterator
        +list_models() list
        +health() bool
    }
    class OllamaEngine {
        +generate(...)
        +stream(...)
    }
    class GuardrailsEngine {
        -_engine InferenceEngine
        -_scanners list
        -_mode RedactionMode
        +generate(messages, model) dict
        +stream(messages, model) AsyncIterator
        +list_models() list
        +health() bool
    }
    InferenceEngine <|-- OllamaEngine
    InferenceEngine <|-- GuardrailsEngine
    GuardrailsEngine o-- InferenceEngine : wraps

Because GuardrailsEngine is itself an InferenceEngine, it can be nested arbitrarily (for example, wrapped again in an instrumented engine) or passed to any code that accepts an engine.

generate() Call Sequence¶

sequenceDiagram
    participant C as Caller
    participant G as GuardrailsEngine
    participant S as Scanners
    participant E as Wrapped Engine

    C->>G: generate(messages, model)
    G->>S: scan(message.content) for each message
    S-->>G: ScanResult
    alt findings detected
        G->>G: _handle_findings(text, result, "input")
        note over G: WARN: publish event, pass through
        note over G: REDACT: run redact(), replace content
        note over G: BLOCK: raise SecurityBlockError
    end
    G->>E: generate(messages, model)
    E-->>G: response dict
    G->>S: scan(response["content"])
    S-->>G: ScanResult
    alt findings detected
        G->>G: _handle_findings(content, result, "output")
    end
    G-->>C: response dict (possibly sanitized)

stream() Behavior¶

For streaming, the engine yields tokens to the caller in real time. The security layer accumulates the full output and scans it after the stream ends. Because the scan is post-hoc, BLOCK mode cannot prevent delivery of streamed tokens — it only applies to the input side.

sequenceDiagram
    participant C as Caller
    participant G as GuardrailsEngine
    participant E as Wrapped Engine
    participant S as Scanners

    C->>G: stream(messages, model)
    G->>S: scan inputs (before streaming)
    G->>E: stream(messages, model)
    loop each token
        E-->>G: token
        G-->>C: yield token
    end
    G->>S: scan(accumulated output)
    alt findings detected
        G->>G: publish SECURITY_ALERT (stream_post_hoc)
    end

Event Flow¶

Security events flow through the EventBus using three event types:

Event	When Published	Payload Keys
`SECURITY_SCAN`	(Reserved for future use)	—
`SECURITY_ALERT`	Findings detected in WARN or REDACT mode	`direction`, `findings`, `mode`
`SECURITY_BLOCK`	Findings detected in BLOCK mode	`direction`, `findings`, `mode`

The direction field is either "input" or "output". The findings value is a list of dicts with keys pattern, threat, and description.

The AuditLogger subscribes to all three event types and writes them to SQLite. This subscription is established at construction time:

flowchart TB
    A[GuardrailsEngine] -->|SECURITY_ALERT| B[EventBus]
    A -->|SECURITY_BLOCK| B
    B --> C[AuditLogger._on_event]
    C --> D[SQLite audit.db]
    B --> E[Other subscribers\ne.g. logging, alerting]

File Policy Integration¶

The file policy (file_policy.py) operates independently of the scanner pipeline. It answers a single yes/no question: is this file path considered sensitive?

Integration Points¶

FileReadTool calls is_sensitive_file() before reading any path. If the path matches a sensitive pattern, the tool returns an error rather than the file contents. This cannot be bypassed at the tool level.

Memory ingest path (memory/ingest.py) uses filter_sensitive_paths() to remove sensitive files from a directory listing before indexing. Files matching sensitive patterns are silently skipped.

flowchart LR
    A[FileReadTool.execute] --> B{is_sensitive_file?}
    B -- Yes --> C[Return error: sensitive file blocked]
    B -- No --> D[Read and return file contents]

    E[memory ingest_path] --> F[glob directory]
    F --> G[filter_sensitive_paths]
    G --> H[Index remaining files]

The file policy does not publish events or use the event bus. It is a pure function — deterministic, stateless, and side-effect-free.

Audit Logging Architecture¶

AuditLogger maintains a single SQLite table (security_events) with the following schema:

Column	Type	Description
`id`	`INTEGER PRIMARY KEY`	Auto-increment row ID
`timestamp`	`REAL`	Unix timestamp of the event
`event_type`	`TEXT`	`SecurityEventType` value string
`findings_json`	`TEXT`	JSON-encoded list of `ScanFinding` dicts
`content_preview`	`TEXT`	Short preview of the scanned content
`action_taken`	`TEXT`	Mode string (`warn`, `redact`, `block`)

The database is written in append-only mode. There is no built-in rotation or truncation — manage retention externally by deleting old entries with SQLite tooling or by using a path-per-session audit log.

The default path is ~/.openjarvis/audit.db, configurable via security.audit_log_path in config.toml.

Relationship to Other Modules¶

Module	How Security Integrates
Engine	`GuardrailsEngine` wraps any `InferenceEngine`
Tools	`FileReadTool` calls `is_sensitive_file()`
Memory	Ingest path calls `filter_sensitive_paths()`
EventBus	Security events published to `SECURITY_ALERT`, `SECURITY_BLOCK`
Config	`SecurityConfig` dataclass loaded from `[security]` in `config.toml`