jeremy.runtime
jeremy@agent: /writing

Writing

Long essays and shorter notes from building AI products and the platforms they depend on: agentic systems, enterprise AI infrastructure, Swoleby, developer tooling, and practical product judgment.

Showing all writing.

OpenAPI is still one of the best bridges into agent tooling

Every company already has APIs. Many already have OpenAPI specs. The fastest path to practical AI tool use is often turning existing contracts into usable, governed tool surfaces. The useful work is the messy middle: auth, pagination, ambiguous endpoints, destructive operations, rate limits, and deciding which API actions should be exposed to an agent at all.

short · 2026

Human-in-the-loop is not a modal

Many products treat human review as a confirmation dialog. That is too shallow. Human-in-the-loop design is about choosing where judgment belongs, what context the human needs, and what the system should learn from approval or rejection. Review is a workflow primitive: summarize intent, show diffs, explain risk, preserve undo, and make approval data useful for future evals.

short · 2026

Personal operating systems for agent-led work

The next useful layer around coding agents is not one more prompt. It is a working system for plans, context, review queues, CI signals, rollback checks, and long-running follow-through. OpenClaw, Codex, review agents, babysitting workflows, cron-driven work, and PR triage all point toward that operating model.

short · 2026

How much memory should an AI coach have?

Memory is useful until it feels creepy or wrong. A coach needs enough context to avoid asking the same questions repeatedly, but not so much that the user feels surveilled or trapped by old information. A practical memory model separates stable profile facts, recent conversation state, explicit preferences, opt-out boundaries, and reviewable summaries the user can correct.

short · 2025

The reminder is the product

For behavior-change products, reminders are not notification plumbing. They are the main surface where product judgment shows up. A reminder can be useful, annoying, shaming, timely, irrelevant, or exactly what the user needed. The product surface includes cadence, quiet hours, reply handling, user control, and whether reminders produce action rather than engagement noise.

short · 2025

Prompt tuning is product tuning

Prompt work is often framed as model whispering. In a real product it is closer to product tuning. A prompt encodes tone, policy, assumptions, data contracts, tool expectations, and what the product considers a good next action. Swoleby and agent tooling make that concrete: short responses, direct calls to action, safe boundaries, and prompts evaluated against behavior checks.

short · 2025

Slash commands are underrated AI product design

Slash commands make capability visible. They give users a way to discover what the system can do, repeat useful actions, and build a mental model of the tool. Good AI interfaces should expose affordances instead of hiding everything behind a blank text box. A little structure can make the system feel more powerful, not less.

short · 2024

What I want from an AI coding runtime

AI coding tools are strongest when they preserve engineering discipline: read the code first, make scoped changes, test the actual surface, and explain tradeoffs. They are weakest when they become autocomplete with commit access. The through-line across OpenClaw, Codex, Claude Code, Cursor, and oh-my-codex is reusable prompts, skills, state, teams, browser verification, and better runtime ergonomics.

short · 2024

Pareto frontiers are a better metaphor for AI systems work

Most AI product decisions are tradeoffs: accuracy, cost, latency, safety, user control, and implementation complexity. A single best answer is often the wrong goal. The better question is which frontier you are moving. The best work expands the frontier instead of optimizing one metric in isolation.

short · 2024

Generative AI still needs predictive AI

LLMs changed the interface, but prediction did not stop mattering. Ranking, scoring, routing, detection, evaluation, personalization, and decision support are still central to useful AI systems. The model that writes the response is only one component; the system still decides what to retrieve, what to trust, what to show, and what to do next.

short · 2024

Domain-Driven Design: an overview

Domain-Driven Design is a software design approach that helps developers create software closely aligned with the needs of the business. The primary focus is the domain or business problem the software aims to solve...

long · 16 Feb 2023

Scientific software made me care about metadata

At Harvard Medical School, the software challenge was not just computation. It was metadata, quality, workflow, and making complex data usable for researchers. AI products have the same failure mode: bad metadata creates bad retrieval, bad evaluation, and bad user trust.

short · 2022

From microcontrollers to agents: interfaces that reach the real world

The old NodeMCU and Alexa smart-home projects look far away from agentic AI, but the pattern is similar: connect software to a physical or behavioral workflow, then design for reliability at the boundary. The interface matters most when it changes what happens outside the screen.

short · 2018

Real Python taught me that developer education is product work

Developer education is not just documentation. It is product design for understanding. The examples, pacing, conceptual model, and failure modes all determine whether someone can actually use the technology. If developers cannot understand the system, debug it, and build confidence through small wins, the platform does not matter.

short · 2016

What consulting teaches you about enterprise AI

Running a consulting firm teaches you that the clean architecture diagram is never the whole story. The real system includes budgets, timelines, users, support, client politics, and the cost of production failure. Organizations need systems that fit how they actually operate, not demos that assume ideal users and ideal data.

short · 2016