Agentic Systems

The core judgment

An agent is not useful because it can call a function once in a demo. It becomes useful when the surrounding system makes tool use understandable, bounded, recoverable, and measurable.

That means treating tool surfaces like product APIs: stable schemas, clear names, scoped permissions, reviewable outputs, logs, and careful separation between reading, planning, previewing, and executing.

Proof points

This shows up across DataRobot platform work, OpenClaw enablement, Codex workflow experimentation, Swoleby agent loops, and public writing on MCP, sandboxing, and behavior-based evals.

Claim	Evidence	Artifact
Production agent systems need bounded execution.	DataRobot Global MCP and secure sandbox work use search/execute flows, per-call sandboxing, controls, and recovery paths.	DataRobot AI Platform
Tool ecosystems need contracts, not demos.	MCP, OpenAPI, skills, and tool catalogs show up as platform primitives instead of isolated prompts.	What MCP changes about tool ecosystems
Agent quality has to be measured behaviorally.	Eval harnesses, traces, and product-specific checks gate whether agents do useful work safely.	Agent evals should measure behavior, not vibes

The core judgment

Proof points

Related writing