Skip to content

redaction

redaction

PII redaction for analytics property values.

This is the value-level half of the guardrail (:mod:openjarvis.analytics.events is the structural half).

For each property value we ship, we: 1. Drop strings longer than MAX_STR_LEN (no chunks of chat content). 2. Drop strings that match any known PII pattern (emails, IPs, MACs, $HOME paths, API keys, JWTs, bearer tokens, etc.). 3. Otherwise pass through unchanged.

Fail-closed: when in doubt, drop. Combined with the event-spec allowlist this gives two independent layers of protection.

Functions

looks_like_pii

looks_like_pii(s: str) -> bool

Return True if any PII pattern matches anywhere in s.

Source code in src/openjarvis/analytics/redaction.py
def looks_like_pii(s: str) -> bool:
    """Return True if any PII pattern matches anywhere in ``s``."""
    for pattern in _PII_PATTERNS:
        if pattern.search(s):
            return True
    return False

redact

redact(properties: dict[str, Any]) -> dict[str, Any]

Return a copy of properties with PII-bearing string values dropped.

Non-string values pass through unchanged. Strings exceeding MAX_STR_LEN are dropped. Strings matching any PII pattern are dropped. The event-spec validator (in :mod:events) runs after this and provides a second layer of structural enforcement.

Source code in src/openjarvis/analytics/redaction.py
def redact(properties: dict[str, Any]) -> dict[str, Any]:
    """Return a copy of ``properties`` with PII-bearing string values dropped.

    Non-string values pass through unchanged. Strings exceeding
    ``MAX_STR_LEN`` are dropped. Strings matching any PII pattern are
    dropped. The event-spec validator (in :mod:`events`) runs after
    this and provides a second layer of structural enforcement.
    """
    out: dict[str, Any] = {}
    for key, value in properties.items():
        if isinstance(value, str):
            if not value:
                # empty string is uninformative — drop
                continue
            if len(value) > MAX_STR_LEN:
                continue
            if looks_like_pii(value):
                continue
        elif isinstance(value, (list, dict, set, tuple)):
            # Composite values are never sent — keeps the surface tiny.
            continue
        out[key] = value
    return out

hash_id

hash_id(s: str) -> str

Return a 16-char sha256 prefix of s.

Used for model / tool / connector names that aren't on the public allowlist — we still want to see "uses-a-custom-model-X" cohorting without ever learning which model.

Source code in src/openjarvis/analytics/redaction.py
def hash_id(s: str) -> str:
    """Return a 16-char sha256 prefix of ``s``.

    Used for model / tool / connector names that aren't on the public
    allowlist — we still want to see "uses-a-custom-model-X" cohorting
    without ever learning which model.
    """
    if not s:
        return ""
    return hashlib.sha256(s.encode("utf-8")).hexdigest()[:16]