jeremy.runtime
jeremy@agent: /write enterprise-ai-needs-boring-infrastructure

29 May 2025

Enterprise AI Needs Boring Infrastructure

The useful version of agentic AI depends on permissions, audit trails, recovery paths, sandboxes, observability, and human review.

Most public agent demos optimize for surprise. A browser opens, a tool gets called, a spreadsheet changes, and everyone sees the magical part. That is useful as a capability proof, but it is not the hard part of enterprise AI.

The enterprise version is less glamorous. It needs permissions. It needs audit trails. It needs recovery paths. It needs sandboxes. It needs observability. It needs a human to understand what happened after the fact, especially when the system touched data, called an API, changed a record, generated code, or took an action that costs money.

The most important question is not “can the model do it?” The better question is “can the organization trust the system around the model?”

That system needs contracts. A tool should describe what it accepts, what it returns, what it can change, and what failure looks like. An agent runtime should know the difference between reading, planning, previewing, and executing. A product should expose those differences to the user instead of hiding everything behind a blank chat box.

The boring pieces become product features:

  • Scoped permissions make the system usable by more than one team.
  • Execution logs make AI work inspectable instead of mystical.
  • Human approval creates a decision point where judgment belongs.
  • Sandboxes reduce blast radius.
  • Evals catch regressions that demos miss.
  • Rollback paths make teams willing to try again after mistakes.

This is why I care about MCP, OpenAPI tool surfaces, sandboxed execution, and human-in-the-loop controls. Those are not infrastructure chores. They are the shape of the product once AI starts doing real work.

A strong enterprise AI platform should make advanced capability feel less fragile. The user should be able to ask: what tools were available, what did the agent choose, what did it read, what did it write, what needed approval, what failed, and what can be undone?

If the answer is “trust the model,” the product is not done.

The teams that win here will treat agentic AI as a systems problem. Models matter enormously, but the model is only one actor in the workflow. The platform around it has to make capability governable, observable, repeatable, and safe enough for serious users.

That is the frontier I want to work on: turning impressive model behavior into infrastructure people can trust.