// design decisions
Credibility requires honesty about what a system does and does not do. These are deliberate design decisions, not gaps.
Decision: The proxy blocks tool calls when OPA is unreachable.
Why: For governed agents, security takes precedence over availability. A missed governance evaluation is worse than a paused agent.
Implication: An OPA outage stops all governed agent activity until OPA recovers.
Mitigation: OPA health monitoring with sub-second alerts. Per-tenant override to fail-open for non-critical agent classes. OPA sidecar architecture minimizes network dependency.
Decision: Every tool call passes through the proxy before execution.
Why: Pre-execution enforcement is the architectural foundation. Post-hoc detection cannot prevent actions.
Implication: Adds 5-30ms per tool call. OPA evaluation is sub-millisecond; the bulk is the network hop.
Mitigation: Connection pooling, local OPA sidecar, and async audit logging minimize overhead. For context, a typical LLM inference call takes 500ms-5s. The governance layer is negligible relative to agent processing time.
Decision: Support both Streamable HTTP and stdio MCP transports.
Why: Many local development tools (filesystem, shell) use stdio. Excluding them would leave a governance gap.
Implication: stdio MCP backends are process-local and cannot be load-balanced or health-checked like HTTP backends.
Mitigation: HTTP backends recommended for production workloads. stdio supported for development and edge cases where local process binding is required.
Decision: SHA-256 chain with previous_hash linkage, not blockchain or distributed consensus.
Why: Distributed consensus adds latency with no governance benefit for single-tenant audit chains. The threat model is tampering, not Byzantine fault tolerance.
Implication: Chain integrity depends on database access controls. A compromised database could theoretically rewrite history.
Mitigation: Chain verification endpoint validates integrity on demand. Nightly background verification task. SIEM export provides an independent backup outside Behavry's control. External audit anchoring (S3 WORM bucket) on roadmap.
Decision: Anomaly detection compares against rolling behavioral baselines that accumulate over time.
Why: Static thresholds produce false positives. Behavioral baselines adapt to each agent's actual usage pattern.
Implication: New agents have no baseline for the first N hours. Anomaly detection is less effective during this period.
Mitigation: Configurable warm-up period. Manual baseline seeding for known agent profiles. Conservative alerting thresholds during warm-up. Policy-based enforcement (allow/deny/intercept) works from the first tool call regardless of baseline status.
Decision: 26 regex patterns across 7 categories. Deterministic, zero-latency, no external API dependency.
Why: Pattern matching is fast, auditable, and does not introduce a dependency on an external ML service in the critical path.
Implication: Sophisticated exfiltration that avoids pattern signatures may not be caught by DLP alone.
Mitigation: Inbound rules engine adds semantic matching for tool responses. Cross-session fragment reassembly detects credential splitting across sequential requests. Community Policy Library extends pattern coverage through collective intelligence. Behavioral anomaly detection catches volume-based exfiltration that individual pattern matches miss.
Decision: Single AWS EC2 with Docker Compose for the current deployment.
Why: Seed-stage operational simplicity. One region, one stack, fast iteration.
Implication: No geographic redundancy. Single point of failure at the infrastructure level.
Mitigation: Docker Compose architecture is portable to any cloud or on-prem environment. Multi-region and multi-cloud deployment on roadmap. BYOC and Self-Hosted deployment models allow customers to run the stack in their own redundant infrastructure today.
Decision: Unrecognized tools from newly connected MCP servers are hidden by default until an admin classifies them.
Why: Least privilege for cognition. An agent should not access tools that have not been evaluated and classified.
Implication: New tools from newly connected MCP servers are invisible to agents until an admin approves them.
Mitigation: Admin notification on new tool detection. One-click approve in the dashboard. Bulk classification for servers with many tools. Auto-classification rules based on tool naming patterns.