long / developer tooling
Agent-driven development needs a control plane
The longer version covers the operating system around agentic development: architect/coder splits, review agents, babysitting, cron-driven follow-through, PR triage, low-risk autonomy, rollback testing, and feature flags.
Read the longer version
01 / agentic systems
Enterprise AI needs boring infrastructure
The public conversation around agents tends to reward demos: a tool call works, a browser clicks around, a spreadsheet gets edited. The enterprise version is less glamorous and more important. The hard parts are permissions, audit trails, recovery paths, sandboxing, observability, and human review.
The argument: useful agentic systems look less like magic and more like infrastructure. They need contracts for tools, typed inputs and outputs, execution modes, policy boundaries, and logs that let a human understand what happened after the fact.
02 / mcp
What MCP changes about tool ecosystems
MCP is useful because it gives agent systems a way to discover and call capabilities without every integration becoming a bespoke prompt hack. That matters most when tool use becomes a platform surface rather than a demo-specific script.
The explanation centers on why tool discovery, schema discipline, execution context, and error handling matter. The thesis is simple: agent tools are APIs with a new caller, not an excuse to abandon API design.
03 / openapi tools
OpenAPI is still one of the best bridges into agent tooling
Every company already has APIs. Many of them already have OpenAPI specs. That means the fastest path to practical AI tool use is not always inventing a new tool protocol from scratch. It is turning existing API contracts into usable, governed tool surfaces.
The useful angle is the messy middle: auth, pagination, ambiguous endpoints, destructive operations, rate limits, and how to decide which API actions should be exposed to an agent at all.
04 / secure execution
Sandboxing is a product feature, not an implementation detail
If an AI system can execute code or operate tools, sandboxing becomes part of the user experience. Users need to know what can happen, what cannot happen, what needs approval, and how to inspect the result.
The connection is between technical sandbox design and product trust: scoped filesystems, network controls, timeouts, secrets boundaries, preview-before-apply flows, and why "the model decided" is never a satisfying explanation.
05 / human review
Human-in-the-loop is not a modal
Many products treat human review as a confirmation dialog. That is too shallow. Human-in-the-loop design is about choosing where judgment belongs, what context the human needs, and what the system should learn from approval or rejection.
The argument treats review as a workflow primitive: summarize intent, show diffs, explain risk, preserve undo, and make approval data useful for future evals.
06 / evals
Agent evals should measure behavior, not vibes
Agent evaluation gets weak when it only asks whether an answer sounds good. Real systems need to evaluate task completion, tool choice, safety boundaries, cost, latency, and whether the system recovered from messy intermediate states.
The Swoleby engagement experiments are a good public example: tone guardrails, actionable calls to action, max length, non-empty outputs, and stage-level scoring are small but concrete moves toward behavior-based quality.
07 / product design
Building Swoleby: why SMS beats another app
The product bet behind Swoleby is that behavior change does not start in a dashboard. It starts when the person is about to skip the thing they said they wanted to do. SMS is closer to that moment than a dashboard or a weekly report.
The product logic is reminders, context, accountability, tone, fallback plans, and the difference between a feature that is impressive and a loop that someone actually uses.
08 / memory
How much memory should an AI coach have?
Memory is useful until it feels creepy or wrong. A coach needs enough context to avoid asking the same questions repeatedly, but not so much that the user feels surveilled or trapped by old information.
A practical memory model separates stable profile facts, recent conversation state, explicit user preferences, opt-out boundaries, and reviewable summaries that let the user correct the system.
09 / reminders
The reminder is the product
For behavior-change products, reminders are not notification plumbing. They are the main surface where product judgment shows up. A reminder can be useful, annoying, shaming, timely, irrelevant, or exactly what the user needed.
The product surface includes cadence, quiet hours, reply handling, user control, and evaluation of whether reminders produce action rather than just more engagement noise.
10 / prompt systems
Prompt tuning is product tuning
Prompt work is often framed as model whispering. In a real product it is closer to product tuning. A prompt encodes tone, policy, assumptions, data contracts, tool expectations, and what the product considers a good next action.
Swoleby and agent tooling provide concrete examples: short responses, direct calls to action, safe boundaries, and prompts that are evaluated against concrete behavior checks.
11 / developer tools
Slash commands are underrated AI product design
Slash commands make capability visible. They give users a way to discover what the system can do, repeat useful actions, and build a mental model of the tool.
The broader argument: good AI interfaces should expose affordances instead of hiding everything behind a blank text box. A little structure can make the system feel more powerful, not less.
12 / codex workflow
What I want from an AI coding runtime
AI coding tools are strongest when they preserve engineering discipline: read the code first, make scoped changes, test the actual surface, and explain tradeoffs. They are weakest when they become autocomplete with commit access.
The through-line runs across OpenClaw, Codex, Claude Code, Cursor, and oh-my-codex: reusable prompts, skills, state, teams, browser verification, and the need for better agent runtime ergonomics.
13 / pareto
Pareto frontiers are a better metaphor for AI systems work
Most AI product decisions are tradeoffs: accuracy, cost, latency, safety, user control, and implementation complexity. A single "best" answer is often the wrong goal. The better question is which frontier you are moving.
This article can connect syftr-style optimization to career and product storytelling: the best work expands the frontier instead of optimizing one metric in isolation.
14 / predictive ai
Generative AI still needs predictive AI
LLMs changed the interface, but prediction did not stop mattering. Ranking, scoring, routing, detection, evaluation, personalization, and decision support are still central to useful AI systems.
Generative and predictive AI are complementary. The model that writes the response is only one component. The system still needs to decide what to retrieve, what to trust, what to show, and what to do next.
15 / technical writing
Real Python taught me that developer education is product work
Developer education is not just documentation. It is product design for understanding. The examples, pacing, conceptual model, and failure modes all determine whether someone can actually use the technology.
Real Python connects directly to current AI tooling: if developers cannot understand the system, debug it, and build confidence through small wins, the platform does not matter.
16 / consulting
What consulting teaches you about enterprise AI
Running a consulting firm teaches you that the clean architecture diagram is never the whole story. The real system includes budgets, timelines, users, support, client politics, and the cost of production failure.
Softworks is the background and enterprise AI is the point: organizations need systems that fit how they actually operate, not demos that assume ideal users and ideal data.
17 / scientific software
Scientific software made me care about metadata
At Harvard Medical School, the software challenge was not just computation. It was metadata, quality, workflow, and making complex data usable for researchers. That lesson transfers directly to AI systems.
AI products are only as useful as the surrounding context model. Bad metadata creates bad retrieval, bad evaluation, and bad user trust.
18 / iot
From microcontrollers to agents: interfaces that reach the real world
The old NodeMCU and Alexa smart-home projects look far away from agentic AI, but the pattern is similar: connect software to a physical or behavioral workflow, then design for reliability at the boundary.
The older hardware posts connect naturally to Swoleby: the interface matters most when it changes what happens outside the screen.
19 / ad agents
Ad management is a good agent benchmark
Meta ads are a useful agent domain because the workflow is repetitive but not trivial: monitor spend, spot fatigue, find winners, generate copy, upload variants, and ask for approval before money moves.
Meta-ads-kit is the public example for why the best agent workflows combine pattern recognition, tool execution, human approval, and closed-loop measurement.
20 / operating systems
Personal operating systems for agent-led work
The next useful layer around coding agents is not one more prompt. It is a working system for plans, context, review queues, CI signals, rollback checks, and long-running follow-through.
The article connects OpenClaw, Codex, review agents, babysitting workflows, cron-driven work, and PR triage into one operating model for getting useful work shipped.