jeremy@agent: /writing

Writing

Long essays and shorter notes from building AI products and the platforms they depend on: agentic systems, enterprise AI infrastructure, Swoleby, developer tooling, and practical product judgment.

Showing all writing.

Agentic SDLC: from solo dev to agent operator

What changed when I stopped treating coding agents as autocomplete and started treating them as workers inside a software delivery system...

long · 14 May 2026

Agent-driven development needs a control plane

Agent-driven development starts with a seductive idea: give an agent a task, wait, and get a working feature back. That works often enough to be exciting. It also fails often enough to teach the real lesson...

long · 13 May 2026

Sandboxing is a product feature

When an AI system can execute code, install dependencies, edit files, call APIs, or operate tools, sandboxing is not just an implementation detail. It becomes part of the product...

long · 18 Mar 2026

OpenAPI is still one of the best bridges into agent tooling

Every company already has APIs. Many already have OpenAPI specs. The fastest path to practical AI tool use is often turning existing contracts into usable, governed tool surfaces. The useful work is the messy middle: auth, pagination, ambiguous endpoints, destructive operations, rate limits, and deciding which API actions should be exposed to an agent at all.

short · 2026

Human-in-the-loop is not a modal

Many products treat human review as a confirmation dialog. That is too shallow. Human-in-the-loop design is about choosing where judgment belongs, what context the human needs, and what the system should learn from approval or rejection. Review is a workflow primitive: summarize intent, show diffs, explain risk, preserve undo, and make approval data useful for future evals.

short · 2026

Personal operating systems for agent-led work

The next useful layer around coding agents is not one more prompt. It is a working system for plans, context, review queues, CI signals, rollback checks, and long-running follow-through. OpenClaw, Codex, review agents, babysitting workflows, cron-driven work, and PR triage all point toward that operating model.

short · 2026

Agent evals should measure behavior, not vibes

A lot of AI evaluation starts with a reasonable instinct: look at the answer and decide whether it seems good. That is not enough for agents...

long · 25 Jul 2025

Building Swoleby: why SMS beats another app

The product bet behind Swoleby is simple: most people do not need another fitness dashboard. They need help closer to the moment where behavior happens...

long · 17 Jul 2025

How much memory should an AI coach have?

Memory is useful until it feels creepy or wrong. A coach needs enough context to avoid asking the same questions repeatedly, but not so much that the user feels surveilled or trapped by old information. A practical memory model separates stable profile facts, recent conversation state, explicit preferences, opt-out boundaries, and reviewable summaries the user can correct.

short · 2025

The reminder is the product

For behavior-change products, reminders are not notification plumbing. They are the main surface where product judgment shows up. A reminder can be useful, annoying, shaming, timely, irrelevant, or exactly what the user needed. The product surface includes cadence, quiet hours, reply handling, user control, and whether reminders produce action rather than engagement noise.

short · 2025

Prompt tuning is product tuning

Prompt work is often framed as model whispering. In a real product it is closer to product tuning. A prompt encodes tone, policy, assumptions, data contracts, tool expectations, and what the product considers a good next action. Swoleby and agent tooling make that concrete: short responses, direct calls to action, safe boundaries, and prompts evaluated against behavior checks.

short · 2025

Enterprise AI needs boring infrastructure

Most public agent demos optimize for surprise. A browser opens, a tool gets called, a spreadsheet changes, and everyone sees the magical part. That is useful as a capability proof, but it is not the hard part of enterprise AI...

long · 29 May 2025

What MCP changes about tool ecosystems

The interesting thing about MCP is not that an agent can call a tool. Agents have been calling tools through ad hoc function definitions, plugins, scripts, browser automation, and API wrappers for a while...

long · 25 Nov 2024

Slash commands are underrated AI product design

Slash commands make capability visible. They give users a way to discover what the system can do, repeat useful actions, and build a mental model of the tool. Good AI interfaces should expose affordances instead of hiding everything behind a blank text box. A little structure can make the system feel more powerful, not less.

short · 2024

What I want from an AI coding runtime

AI coding tools are strongest when they preserve engineering discipline: read the code first, make scoped changes, test the actual surface, and explain tradeoffs. They are weakest when they become autocomplete with commit access. The through-line across OpenClaw, Codex, Claude Code, Cursor, and oh-my-codex is reusable prompts, skills, state, teams, browser verification, and better runtime ergonomics.

short · 2024

Pareto frontiers are a better metaphor for AI systems work

Most AI product decisions are tradeoffs: accuracy, cost, latency, safety, user control, and implementation complexity. A single best answer is often the wrong goal. The better question is which frontier you are moving. The best work expands the frontier instead of optimizing one metric in isolation.

short · 2024

Generative AI still needs predictive AI

LLMs changed the interface, but prediction did not stop mattering. Ranking, scoring, routing, detection, evaluation, personalization, and decision support are still central to useful AI systems. The model that writes the response is only one component; the system still decides what to retrieve, what to trust, what to show, and what to do next.

short · 2024

Domain-Driven Design: an overview

Domain-Driven Design is a software design approach that helps developers create software closely aligned with the needs of the business. The primary focus is the domain or business problem the software aims to solve...

long · 16 Feb 2023

Scientific software made me care about metadata

At Harvard Medical School, the software challenge was not just computation. It was metadata, quality, workflow, and making complex data usable for researchers. AI products have the same failure mode: bad metadata creates bad retrieval, bad evaluation, and bad user trust.

short · 2022

From microcontrollers to agents: interfaces that reach the real world

The old NodeMCU and Alexa smart-home projects look far away from agentic AI, but the pattern is similar: connect software to a physical or behavioral workflow, then design for reliability at the boundary. The interface matters most when it changes what happens outside the screen.

short · 2018

Hackster.io Alexa and Arduino Smart Home Challenge

I am excited to share with you my journey of creating an entry for the Hackster.io Alexa Smart Home Challenge. The goal was to build a smart home device that can be controlled by Alexa...

long · 21 Mar 2018

How to install MicroPython on NodeMCU

MicroPython is a lightweight version of the popular Python programming language that can run on microcontrollers, such as the NodeMCU. Installing MicroPython on NodeMCU lets you create simple programs that control sensors, lights, and other devices...

long · 20 Jan 2018

Real Python taught me that developer education is product work

Developer education is not just documentation. It is product design for understanding. The examples, pacing, conceptual model, and failure modes all determine whether someone can actually use the technology. If developers cannot understand the system, debug it, and build confidence through small wins, the platform does not matter.

short · 2016

What consulting teaches you about enterprise AI

Running a consulting firm teaches you that the clean architecture diagram is never the whole story. The real system includes budgets, timelines, users, support, client politics, and the cost of production failure. Organizations need systems that fit how they actually operate, not demos that assume ideal users and ideal data.

short · 2016