What changed when I stopped treating coding agents as autocomplete and started treating them as workers inside a software delivery system...
Writing
Long essays and shorter notes from building AI products and the platforms they depend on: agentic systems, enterprise AI infrastructure, Swoleby, developer tooling, and practical product judgment.
Agent-driven development starts with a seductive idea: give an agent a task, wait, and get a working feature back. That works often enough to be exciting. It also fails often enough to teach the real lesson...
When an AI system can execute code, install dependencies, edit files, call APIs, or operate tools, sandboxing is not just an implementation detail. It becomes part of the product...
OpenAPI is still one of the best bridges into agent tooling
Every company already has APIs. Many already have OpenAPI specs. The fastest path to practical AI tool use is often turning existing contracts into usable, governed tool surfaces. The useful work is the messy middle: auth, pagination, ambiguous endpoints, destructive operations, rate limits, and deciding which API actions should be exposed to an agent at all.
Human-in-the-loop is not a modal
Many products treat human review as a confirmation dialog. That is too shallow. Human-in-the-loop design is about choosing where judgment belongs, what context the human needs, and what the system should learn from approval or rejection. Review is a workflow primitive: summarize intent, show diffs, explain risk, preserve undo, and make approval data useful for future evals.
Personal operating systems for agent-led work
The next useful layer around coding agents is not one more prompt. It is a working system for plans, context, review queues, CI signals, rollback checks, and long-running follow-through. OpenClaw, Codex, review agents, babysitting workflows, cron-driven work, and PR triage all point toward that operating model.
A lot of AI evaluation starts with a reasonable instinct: look at the answer and decide whether it seems good. That is not enough for agents...
The product bet behind Swoleby is simple: most people do not need another fitness dashboard. They need help closer to the moment where behavior happens...
How much memory should an AI coach have?
Memory is useful until it feels creepy or wrong. A coach needs enough context to avoid asking the same questions repeatedly, but not so much that the user feels surveilled or trapped by old information. A practical memory model separates stable profile facts, recent conversation state, explicit preferences, opt-out boundaries, and reviewable summaries the user can correct.
The reminder is the product
For behavior-change products, reminders are not notification plumbing. They are the main surface where product judgment shows up. A reminder can be useful, annoying, shaming, timely, irrelevant, or exactly what the user needed. The product surface includes cadence, quiet hours, reply handling, user control, and whether reminders produce action rather than engagement noise.
Prompt tuning is product tuning
Prompt work is often framed as model whispering. In a real product it is closer to product tuning. A prompt encodes tone, policy, assumptions, data contracts, tool expectations, and what the product considers a good next action. Swoleby and agent tooling make that concrete: short responses, direct calls to action, safe boundaries, and prompts evaluated against behavior checks.
Most public agent demos optimize for surprise. A browser opens, a tool gets called, a spreadsheet changes, and everyone sees the magical part. That is useful as a capability proof, but it is not the hard part of enterprise AI...
The interesting thing about MCP is not that an agent can call a tool. Agents have been calling tools through ad hoc function definitions, plugins, scripts, browser automation, and API wrappers for a while...
Slash commands are underrated AI product design
Slash commands make capability visible. They give users a way to discover what the system can do, repeat useful actions, and build a mental model of the tool. Good AI interfaces should expose affordances instead of hiding everything behind a blank text box. A little structure can make the system feel more powerful, not less.
What I want from an AI coding runtime
AI coding tools are strongest when they preserve engineering discipline: read the code first, make scoped changes, test the actual surface, and explain tradeoffs. They are weakest when they become autocomplete with commit access. The through-line across OpenClaw, Codex, Claude Code, Cursor, and oh-my-codex is reusable prompts, skills, state, teams, browser verification, and better runtime ergonomics.
Pareto frontiers are a better metaphor for AI systems work
Most AI product decisions are tradeoffs: accuracy, cost, latency, safety, user control, and implementation complexity. A single best answer is often the wrong goal. The better question is which frontier you are moving. The best work expands the frontier instead of optimizing one metric in isolation.
Generative AI still needs predictive AI
LLMs changed the interface, but prediction did not stop mattering. Ranking, scoring, routing, detection, evaluation, personalization, and decision support are still central to useful AI systems. The model that writes the response is only one component; the system still decides what to retrieve, what to trust, what to show, and what to do next.
Domain-Driven Design is a software design approach that helps developers create software closely aligned with the needs of the business. The primary focus is the domain or business problem the software aims to solve...
Scientific software made me care about metadata
At Harvard Medical School, the software challenge was not just computation. It was metadata, quality, workflow, and making complex data usable for researchers. AI products have the same failure mode: bad metadata creates bad retrieval, bad evaluation, and bad user trust.
From microcontrollers to agents: interfaces that reach the real world
The old NodeMCU and Alexa smart-home projects look far away from agentic AI, but the pattern is similar: connect software to a physical or behavioral workflow, then design for reliability at the boundary. The interface matters most when it changes what happens outside the screen.
I am excited to share with you my journey of creating an entry for the Hackster.io Alexa Smart Home Challenge. The goal was to build a smart home device that can be controlled by Alexa...
MicroPython is a lightweight version of the popular Python programming language that can run on microcontrollers, such as the NodeMCU. Installing MicroPython on NodeMCU lets you create simple programs that control sensors, lights, and other devices...
Real Python taught me that developer education is product work
Developer education is not just documentation. It is product design for understanding. The examples, pacing, conceptual model, and failure modes all determine whether someone can actually use the technology. If developers cannot understand the system, debug it, and build confidence through small wins, the platform does not matter.
What consulting teaches you about enterprise AI
Running a consulting firm teaches you that the clean architecture diagram is never the whole story. The real system includes budgets, timelines, users, support, client politics, and the cost of production failure. Organizations need systems that fit how they actually operate, not demos that assume ideal users and ideal data.