The Engineering Product

AI Is Bringing Developers Back to the Terminal. Ink Is Making It Beautiful.

John Costa — Tue, 07 Apr 2026 15:00:00 GMT

The terminal is having a moment, and AI is the reason.

Claude Code. Gemini CLI. GitHub Copilot CLI. The most talked-about developer tools of the past year all live in the terminal. Not web apps or desktop GUIs, but terminal applications. Traditional developers leveraging AI assistance are spending more time in the terminal than ever before, and it's not just for git commands and SSH anymore. Meanwhile, a whole new generation of vibe coders are discovering the command line for the first time because that's where the AI tools are meeting them.

What most people don't realize is that all three of those tools and many others are all built on the same library: Ink. Ink has 36k stars on GitHub and 2.8 million weekly downloads on npm.

Ink is the layer that makes terminal applications look and feel good. Colors, layouts, interactive elements, smooth scrolling. Without it, most CLI tools would just be raw text on a black screen. Ink is to the terminal what CSS and React are to the web browser.

But Ink's core only covers the basics. For richer interactive components, you need community-built libraries, and there's still a lot of room to build.

That's where I come in. I'm currently the project's third most active contributor, behind only the creator and primary maintainer. I added kitty keyboard protocol support, built a renderToString() API, and fixed bugs in the reconciler and fullscreen rendering. Working on the internals showed me exactly what was missing.

Six components I shipped

ink-timer

Ready-made timers, countdowns, and stopwatches for any CLI that needs them. Surprisingly, there wasn't a good one.

ink-tree-view

Collapsible tree with keyboard navigation, virtual scrolling, multi-select, and async child loading. Think VS Code's file explorer, but in the terminal.

ink-combobox

Fuzzy-search autocomplete that filters and ranks as you type. Supports both static option lists and async providers.

ink-file-picker

Filesystem browser where you can navigate directories, filter by glob, and select files. Any CLI that needs the user to pick a file benefits from an interactive picker instead of asking them to type a path.

ink-json-viewer

Interactive JSON tree with syntax coloring, expand/collapse, and virtual scrolling. Saves you from the JSON.stringify(data, null, 2) wall of text.

ink-scrollable-box

Scrollable container with keyboard navigation, vim bindings, and auto-follow for streaming content. Drop it around any content that might overflow and it just works.

As more people spend more time in the terminal, these are the kinds of building blocks the ecosystem needs. The terminal isn't going anywhere. It's getting better.

If you want to follow along with what I'm working on, you can find me on GitHub.

Building a Self-Improving Trading System With AI

John Costa — Tue, 31 Mar 2026 15:00:00 GMT

I built an automated paper trading system. Then I gave it an AI agent that reviews its own performance and deploys improvements. Here's what that looks like in practice, and what I learned about giving an AI real autonomy over a system that matters.

The System

The trading system runs on Alpaca's paper trading API. It combines four strategies: mean reversion with weekly trend confirmation, AI-powered news sentiment, Finnhub insider trading signals, and momentum detection. Every trade goes through three layers of analysis before execution. Position sizing is conviction-based and capped. Sector exposure is tracked and limited.

It runs twice daily via GitHub Actions. The pipeline manages exits first, then scans for new opportunities across 11K+ equities, and logs everything to CSV for analysis later.

None of this is novel. Retail algo trading has been a popular hobby project for years. What made it interesting to me was the next step.

The Agent

Once the system was stable and generating data, I added an autonomous strategy agent. It's Claude Opus 4.6 running through OpenRouter, triggered weekly by a separate GitHub Actions workflow.

The agent has a structured reasoning process: observe performance data, compare against previous changes, diagnose the biggest problem, validate a fix via backtesting, and either deploy or wait. It can modify strategy parameters, adjust filters, create shadow experiments, and roll back failures.

This is the part where most people get nervous. An AI modifying production code autonomously? On a system that trades real money (well, paper money)?

Fair concern. Here's how I handled it.

The Safety Architecture

The agent has tools. Those tools have hardcoded constraints that the agent cannot override, because they live in a file the agent cannot modify.

Minimum sample sizes are enforced at the tool level, not the prompt level. The agent can't deploy a parameter change unless there are at least 50 trades worth of data backing the decision. It can't modify filters without 30 data points. These aren't suggestions in a system prompt. They're Python assertions that throw errors.

The agent can't touch its own code, the risk guard, the resilience layer, tests, or workflows. It gets a maximum of 2 deploys per week. Any file it changes enters a 14-day cooling period before it can be changed again. All tests must pass before any deploy goes through.

Position size is capped at 5% of equity. The max-loss stop can never be disabled. These are hardcoded constants, not configurable parameters.

If the agent's run ends without it calling log_decision (because of an API error or hitting the turn limit), the system forces a minimal log entry. Every run produces an audit trail.

If the changelog file is corrupt, the agent halts entirely. It doesn't try to recover or start fresh. A corrupt changelog means something unexpected happened, and the right response is to stop and let a human look at it.

What I Learned

The prompt is not the safety layer. Early on, I had safety rules in the system prompt: "never exceed 50 trades minimum sample size." That's a suggestion. The model can ignore it, misinterpret it, or forget it in a long conversation. The real safety layer is the code that executes the tools. If the tool throws an error when you try to deploy without enough data, it doesn't matter what the model thinks.

Shadow experiments are underrated. The agent can create experiments where it logs what a parameter change would have done without actually changing anything. This lets it gather evidence before committing to a change. Most of the time, the right decision is to wait and observe. Giving the agent a structured way to "wait and observe" prevents it from deploying premature changes just because it feels like it should do something.

First run behavior matters. I had to explicitly handle the case where the agent runs for the first time and there's no historical data. Without guidance, it would try to "fix" the lack of data by deploying changes. The system prompt now says: if this is your first run, establish a baseline and do NOT make changes. And the minimum sample size enforcement backs this up at the tool level.

Atomic operations everywhere. Every file write goes through tempfile.mkstemp() + os.replace(). If the process crashes mid-write, you get either the old file or the new file, never a half-written one. This matters more than you'd think when you have concurrent GitHub Actions runs writing to the same repository.

The git push race condition. Two cron schedules can overlap if one runs long. The first run pushes, the second run's push gets rejected. I lost a day's worth of state data before adding git pull --rebase before every push in the workflow. Boring problem. Real consequence.

The Engineering Management Parallel

I've written before about how AI-assisted development feels like engineering management. This project made that analogy feel very literal.

The strategy agent is like a junior developer with good instincts but no judgment about when to act. The safety rails are the code review process. The minimum sample sizes are the "show me the data" conversations. The cooling period is the "let's see how the last change lands before making another one."

I still review its changelog. I still check what it deployed. I still override it when I disagree. But I don't have to be there for every decision. That's the point.

The Stack

For anyone curious:

Trading: Alpaca API (paper trading), Python, pandas
Strategies: Mean reversion, sentiment (via OpenRouter), insider signals (Finnhub), momentum
Agent: Claude Opus 4.6 via OpenRouter, OpenAI-compatible SDK
Infrastructure: GitHub Actions (daily pipeline + weekly agent review)
Safety: Hardcoded tool constraints, atomic writes, walk-forward backtesting with holdout validation
Tests: 654 tests across 30 files

The whole thing runs on free-tier GitHub Actions. No servers to maintain. The project is open source.

Is It Making Money?

Not yet. It's paper trading, and the portfolio is down about 3% over its first six weeks. The agent hasn't deployed any live parameter changes because it hasn't hit the minimum sample sizes required to justify one. That's the correct behavior.

What it has done is run shadow experiments. It noticed that positions were showing poor risk/reward asymmetry (6% take-profit vs 12% stop-loss) and created an experiment to test tighter trailing stops without actually changing anything in production. It also paused the screener strategy entirely after the win rate dropped below 30%, which is exactly the kind of tactical decision it's designed to make.

So the system is working as intended. It's observing, gathering data, making small low-risk moves, and waiting for evidence before committing to real changes. It's not making money, but it's not supposed to yet.

The idea is that over time, as it accumulates enough trades and enough data, it starts proving gains in paper trading. If it can demonstrate consistent improvement over months of autonomous operation, then maybe it earns the right to trade with real money. That's the experiment.

Automate Your Life With Cron Jobs, GitHub Actions, and Telegram

John Costa — Thu, 26 Mar 2026 15:00:00 GMT

I check my phone in the morning and there are two Telegram messages waiting. One is a summary of trades a bot made overnight. The other is a heads-up that a new open source bounty just got posted.

I didn't set an alarm or check a dashboard. A few scheduled jobs and a Telegram bot handle all of this, running on GitHub Actions free tier. No servers or hosting costs.

Here's what I've built with that setup.

Trade notifications (and a bot that improves itself)

I have an automated trading system that runs on Alpaca's paper trading API. A cron job runs the trading pipeline twice a day on GitHub Actions. When it finishes, it sends me a Telegram message with what it did.

The more interesting part is a separate weekly job. Once a week, an AI agent reviews the bot's performance, looks at what's working and what isn't, and if it finds a justified improvement, it actually modifies the trading strategy and deploys the change. The bot is tweaking its own code based on how it performs.

There's a lot of safety architecture around that part (full post on this coming soon). But the trigger for all of it is just a cron schedule in a config file.

Bounty alerts

Open source bounties are GitHub issues with cash rewards attached. Anyone can work on them, but you usually need to claim the issue first. Popular ones get grabbed fast.

I built a tool that polls GitHub and Algora every few minutes for new bounty issues matching my watchlist. When one appears, Telegram buzzes. Seeing a bounty 5 minutes after it's posted vs. 5 hours later is often the difference between getting it and not.

There's also an AI-assisted mode that investigates the codebase and drafts a proposal when I decide to go after one. But the core of it is just a polling script and a Telegram message.

A website that updates itself

I rebuilt my personal portfolio site as plain HTML. No frameworks, no build step. But it has a data pipeline behind it.

A daily cron job pulls my latest blog posts from Hashnode and my open source contribution stats from GitHub. If anything changed, it updates the HTML and commits. There's also a webhook path for real-time updates when I publish, but the cron job is the safety net for when webhooks fail.

I published a blog post last week and didn't think about my portfolio at all. The next morning, it was already there.

Claude Code from your phone

This one isn't a cron job, but it fits the theme.

Claude Code has a feature called Channels that connects a running session to a Telegram bot. Once paired, you message Claude from your phone and it works with your actual dev environment. Files, terminal, tools. It replies back in Telegram.

Your machine needs to stay on, but as long as it does, you've got a dev environment in your pocket.

The pattern

Something runs on a schedule. It does useful work. It tells you about it, or just handles it quietly.

You don't need to understand cron syntax or YAML deeply. Describe what you want to an AI and it'll help you wire it up. The hardest part is the idea. Once you build your first one and your phone buzzes with something useful that happened while you weren't looking, you start seeing them everywhere.

Your Portfolio Site Deserves a Pipeline

John Costa — Sun, 15 Mar 2026 04:11:00 GMT

I recently rebuilt my portfolio site from scratch. No React. No Next.js. No build step. Just HTML, CSS, and a few scripts.

It loads in under 200ms. There's nothing to hydrate, no bundle to download, no client-side rendering. It's the simplest thing I could have built.

But here's the thing: it updates itself.

The problem with portfolio sites

Every developer has built a portfolio site and then abandoned it. The blog section still shows posts from 2022. The "recent projects" haven't been updated since you launched. It was supposed to represent who you are, but it represents who you were.

The reason is obvious. Updating a static site means opening the HTML, editing content by hand, and pushing to GitHub. It takes ten minutes. You'll never do it.

The site I replaced had exactly this problem. I'd publish a blog post on Hashnode and forget to update my portfolio. I'd merge a PR into a major open source project and my site had no idea. The portfolio was a snapshot, not a reflection.

So when I rebuilt it, I decided the site itself could stay simple. The engineering would go into keeping it alive.

What stays fresh automatically

My site tracks four things without me touching it:

Blog posts pulled from Hashnode's GraphQL API
OSS contribution stats: total PRs merged, repos contributed to, languages used (GitHub API)
Recently merged PRs with repo name, star count, and language
Notable repos ranked by a composite score of project size and contribution depth

None of this is hardcoded. If I merge a PR into a new repo tomorrow morning, it shows up on my site by tomorrow afternoon.

The architecture

The system has three layers: data fetching, HTML injection, and triggers.

Layer 1: Data fetching

Two Node.js scripts pull data from external APIs and write JSON files. One hits Hashnode's GraphQL API for blog posts. The other uses GitHub's search API to find all my merged PRs across public repos.

The OSS stats script is incremental. It maintains a full list of merged PRs in data/merged-prs.json and only fetches PRs newer than the most recent one on file. This matters because GitHub's search API caps results at 1000. If you have more than that and try to fetch them all at once, you'll silently lose data. The incremental approach sidesteps this entirely: each run only needs to fetch what's new since yesterday.

For each new PR, the script fetches repo metadata (star count, primary language), filters out repos below 50 stars, handles rate limiting by checking X-RateLimit-Remaining and backing off when it gets low, and caches repo metadata to avoid redundant API calls.

From the merged PR list, a deriveOssStats() function builds everything the frontend needs: total counts, per-repo breakdowns, language distribution, and a ranked "notable repos" list. The ranking was an interesting problem.

I didn't want it to be pure star count (that would just show whichever massive repo I happened to send one PR to). I also didn't want pure PR count (that would over-index on small repos I had a drive-by fix for). The composite score:

score = stars * sqrt(prCount)

sqrt dampens the PR count so that quantity matters, but with diminishing returns. In practice this means ink (35.6k stars, 15 PRs, score ~137k) ranks above Homebrew (47k stars, 2 PRs, score ~66k). Depth of contribution wins over drive-by fame.

Layer 2: Marker-based HTML injection

The site is plain HTML. There's no templating engine, no static site generator, no build step. So how does data get into the page?

Comment markers. An embed script reads each JSON data file, generates HTML snippets, and replaces everything between matching and comment pairs. The core is a single regex replacement function:

function replaceBetweenMarkers(html, beginMarker, endMarker, newContent) {
  const pattern = new RegExp(
    `(\({escaped(beginMarker)})([\\s\\S]*?)( *\){escaped(endMarker)})`, 'g'
  );
  return html.replace(pattern, `\(1\n\){newContent}\n$3`);
}

The regex captures the begin marker, everything between, and the end marker. The replacement keeps both markers in place and swaps the middle. This makes it idempotent: you can run it repeatedly and it produces the same result. You can also view source on the live site and see exactly where the dynamic content lives.

I use six marker pairs across two HTML files: four in index.html (stats, repos, recent PRs, blog posts) and two in contributions.html (stats summary, full PR list). Adding a new dynamic section means adding a marker pair and a generate function. That's it.

Since the pipeline fetches from external APIs I don't control, every field gets run through escapeHtml() before interpolation. If someone names their repo , it renders as text.

Layer 3: Triggers (and why there are two)

When I publish a blog post on Hashnode, I want it on my portfolio immediately. Hashnode supports webhooks. So the primary path is real-time: I publish, Hashnode fires a webhook, my site updates within 30 seconds.

But I've been building production systems long enough to know that you can't trust third parties. Webhooks fail silently. Hashnode might change their webhook format. Cloudflare might have an outage. The GitHub API might rate-limit the dispatch. Any link in a three-service chain can break, and you won't know until someone visits your site and sees stale data.

So the webhook is the fast path, and a daily cron job is the safety net. If the webhook fires, great, the site updates in seconds. If it doesn't, the cron job catches it within 24 hours. The same update workflow runs either way, and it's idempotent, so it doesn't matter if both fire on the same day. The cron doesn't check whether the webhook already ran. It just fetches the latest data and embeds it. If nothing changed, no commit is created.

This is the same pattern you'd use in any distributed system: optimistic real-time updates with a periodic reconciliation loop as a backstop.

The webhook relay

GitHub Actions can't receive arbitrary webhooks directly. You need something in the middle. I used a Cloudflare Worker as a relay: it receives the Hashnode webhook, validates the HMAC signature using crypto.subtle.verify() for timing-safe comparison, and dispatches a repository_dispatch event to GitHub. The whole worker is about 40 lines. It validates the signature format before touching crypto, and error responses don't leak any details about the downstream GitHub integration.

Deploy is a single command (npx wrangler deploy), secrets are set in the Cloudflare dashboard.

The cron fallback

The GitHub Actions workflow accepts all three triggers in one file:

on:
  schedule:
    - cron: '0 6 * * *'    # Daily at 6 AM UTC
  workflow_dispatch:         # Manual trigger for debugging
  repository_dispatch:
    types: [hashnode-post-published]  # Webhook path

The job runs the same steps regardless of which trigger fired: run tests, fetch data, embed into HTML, commit if changed. The conditional commit (git diff --staged --quiet) is what makes the dual-trigger approach safe. If the webhook already ran and embedded the same data, the cron creates no commit. No harm, no noise.

The OSS stats workflow is similar but only has cron and manual triggers. There's no webhook equivalent for "you merged a PR on GitHub," so the daily cron is the only automated path there.

Testing

The pipeline generates HTML that gets served to real users, so I test it the same way I'd test production code.

The test suite has 27 tests across four layers: pure function correctness (escaping, date formatting, star display), marker replacement logic, structural integrity (verifying that every expected marker pair exists in the HTML files), and data schema validation (confirming the JSON files have the fields the embed script expects).

The structural tests are the most valuable. If someone accidentally deletes a marker comment while editing the HTML, the test suite catches it before the embed script silently skips that section. Both GitHub Actions workflows run the full test suite before fetching or embedding any data. If tests fail, the pipeline stops before touching anything.

Why not a framework?

I seriously considered Next.js or Astro. Both would have made the data integration more conventional. But I kept coming back to the same question: what does the framework actually give me here?

My site has no client-side interactivity beyond a hamburger menu. There's no routing (it's two pages). There's no state management. The "dynamic" parts update once a day, not on every page load.

A framework would add a build step, a node_modules folder, version upgrades to maintain, and a hosting dependency beyond static file serving. For a site that's fundamentally static content with periodic data refreshes, that's overhead in exchange for convenience I don't need.

The pipeline approach keeps the runtime dead simple (it's just files served by GitHub Pages) and moves the complexity into CI, where I have full Node.js, can test properly, and failures don't affect what's already deployed. If a cron job fails, the site still serves yesterday's data. If a Next.js build fails, you might have no site at all.

What I'd do differently

Start with the data pipeline, not the design. I designed the site first and retrofitted the automation. It would have been cleaner to define the data shapes first and design the HTML around them. I ended up refactoring the marker positions multiple times as the data model evolved.

The marker approach has limits. It works well for discrete sections, but if you needed the same data woven into multiple places (like a sidebar, a footer stat, and a header badge all showing the same PR count), you'd want a real templating engine. For my use case, each section is self-contained, so markers are the right tool.

Seed your data first. The incremental update script assumes existing data exists. I wrote a separate seed-merged-prs.mjs that does a one-time full fetch to bootstrap the dataset. If you're adapting this pattern, plan for that bootstrapping step. Don't try to make your daily update script also handle the initial load.

How to build your own

If you want to try this approach, here's the pattern:

A data/ directory with one JSON file per data source. This is your single source of truth.
Fetch scripts (one per data source) that pull from APIs and write JSON. Make them incremental if you're near any API limits.
An embed script with the replaceBetweenMarkers pattern. Every external string goes through escapeHtml().
Comment markers in your HTML wherever you want dynamic content.
GitHub Actions with cron triggers. Run tests first. Only commit if something changed.
Tests that validate your pure functions, data schemas, and marker integrity.
A webhook relay if any of your data sources support webhooks, with a cron fallback because webhooks aren't reliable.

The whole thing is open source. Fork it, rip out my content, keep the pipeline.

The site is at jcosta.tech. It looks like a simple static page. Under the hood, it's quietly keeping itself current. That's the whole point. The best infrastructure is the kind nobody notices.

Claude Code Tips and Tricks

John Costa — Sun, 08 Mar 2026 05:18:28 GMT

Plan First, Code Second

The single most impactful habit is planning before implementation.

Use Plan Mode

Press Shift+Tab twice to enter Plan Mode. This is where you should spend the most time.

Be specific about the behavior, constraints, and edge cases you care about.
Instruct Claude to ask clarifying questions.
Read the plan carefully before approving and push back when you disagree.

Use Constraints to Keep Claude on Track

Being specific isn't just about what you say in prompts. It's about the structural constraints in your codebase. The more precisely you express your intent, the better the results.

Ask Claude to write tests.
Ask Claude to write documentation.
Use typed languages.
Use state machines.

Iterative Review Loops

Ask Claude to review its own code, fix issues, and review again, in a loop until no new issues are found.

"Review this code. Fix any issues you find. Then review again. Keep going until you stop finding new problems."

Each pass catches things the previous pass introduced or missed.

Set Up Your Project for Success

CLAUDE.md Files

Every project should have a CLAUDE.md file. It's Claude's persistent memory for your project: coding conventions, architecture decisions, common pitfalls.

Start with /init. This generates an initial CLAUDE.md by analyzing your codebase.
Update it when Claude makes a mistake. Wrong naming convention? Note it. Misunderstood a pattern? Note it.
The better your CLAUDE.md, the less time you spend correcting Claude.

Keep each file under 200 lines. If it gets longer, use sub-directory CLAUDE.md files to scope instructions. Claude loads the relevant ones automatically.

Commit Often

Commit after every completed task. Clean rollback point if Claude goes sideways, and it makes parallel work with sub-agents and worktrees easier.

Calibrate Your Level of Oversight

These models are getting better all the time. Push them, but calibrate how much autonomy you give Claude based on the stakes.

Tier 1: Maximum Autonomy. Private Experiments.

Personal projects, private repos, exploratory prototyping.

Longest leash. Let Claude work autonomously. Test the output, try things, skip detailed manual code review.

Tier 2: Moderate Oversight. Public Personal Work.

Public repos, open source contributions, side projects where your reputation is visible.

Claude still does most of the work. You may not review every line, but make sure things are tested before they go out.

Tier 3: Full Scrutiny. Professional Work.

Anything at your job, production systems, team codebases.

Shortest leash. Work slower. Work in smaller pieces. Fully review all code. Fully test before shipping.

Watch what Claude is generating in real time. If it can't solve a problem cleanly, it will sometimes take a shortcut. It will say things like "let's just do this for now and come back to it later," or silently switch to a different approach.

When that happens, press Escape twice to interrupt and discuss. Don't let Claude wander.

In all three tiers, Claude writes the code. What changes is the level of review, testing, and oversight. As models improve, gradually extend the leash.

Manage Your Context Window

Context rot is real. As your context window fills up, quality degrades. This matters most at Tier 3.

Monitor Your Context Usage

Ask Claude to set up a status line that shows your context usage in the bottom left of the terminal, color-coded green, yellow, and red. Green means plenty of room. Yellow means start wrapping up. Red means quality is about to degrade.

Clear Sessions, Not Your Progress

When your context window is filling up but you're mid-task, don't power through. Tell Claude: "I need to clear this session, but I want to continue where we left off. Set that up for me." Claude will write to its memory or generate a handoff prompt. Then run /clear, paste the prompt, and keep going with a fresh context window.

Built-in Features Worth Using

Sub-Agents and Agent Teams

Claude can spawn sub-agents. Separate Claude instances that work on tasks independently and report back. Use them liberally.

They parallelize independent tasks. They specialize, so one agent researches while another implements and another reviews. They also protect your context window. Only the summary comes back.

Once you're comfortable with sub-agents, the next step is git worktrees. If you're running multiple Claude sessions on different features, they'll step on each other in the same working directory. Worktrees give each session its own isolated copy of the repo.

Hooks

Hooks let you run custom commands at specific points in Claude's workflow. Before a tool executes, after it completes, when a session starts, when Claude finishes responding.

Linting and formatting. Auto-run Prettier or ESLint after Claude writes code.
Safety guardrails. Block dangerous commands like rm -rf before they execute.
Custom validation. Run your own checks before Claude commits or deploys.

Turn Repeated Tasks into Skills

If you find yourself asking Claude to do the same thing more than twice, ask it to create a skill instead.

"I keep asking you to do X. Create a skill for this so I can just invoke it next time."

Stay Up to Date

Claude Code ships updates frequently. Some are workflow-changing. Make it a habit to update and explore what's new. Features like /voice, /remote-control and agent teams were recent additions that changed how I work.

Plugins: Start Official, Then Expand

Start with the official plugins, then add third-party as needed.

Official Plugins

Code Simplifier simplifies and refines code for clarity and maintainability after you write it. Catches over-engineering.
PR Review Toolkit is a suite of specialized review agents. Code reviewer, silent failure hunter, test analyzer, comment analyzer. Run these before committing.

Third-Party Plugins

Context7 fetches up-to-date documentation for external libraries. Instead of Claude relying on potentially outdated training data, Context7 pulls the actual current docs.
Serena MCP Server provides semantic code search and navigation. It understands symbols, references, and code structure rather than just text.

CLI Tools vs. MCP Servers

Prefer CLI tools over MCP servers when both options exist. CLI tools are simpler and use less context. Use MCP servers when they offer capabilities CLI tools can't, like Serena's semantic code understanding. For things like Playwright and GitHub, use the CLI.

External Tools That Complement Claude Code

Obsidian as a Dev Brain

Obsidian is a markdown-based knowledge management tool. Ask Claude to set up an Obsidian vault as a "dev brain" for your work. It gives you structured, browsable project context across sessions and a place to track decisions and maintain living architecture docs. Useful for managing multiple projects or work that spans many sessions.

Wispr Flow

Wispr Flow is a voice-to-text tool. Instead of typing long prompts, just talk. Useful for describing complex requirements naturally and staying in flow without switching to a keyboard.

Principles

Plan first, always. Time spent in Plan Mode is almost never wasted.
Teach Claude, don't just correct it. Every mistake is an opportunity to update CLAUDE.md.
Constrain with structure. Tests, types, docs, and state machines keep Claude focused.
Iterate to convergence. Don't settle for a single review pass. Loop until stable.
Protect your context window. Clear often, delegate research to sub-agents, monitor usage.
Match scrutiny to stakes. More rope on experiments, less on professional work.

These practices compound. The more of them you adopt, the more you can trust Claude to do. The faster you move.

The Biggest Bottleneck in Enterprise Software Isn't Technical

John Costa — Thu, 05 Mar 2026 18:04:18 GMT

Enterprise projects don't move slowly because the engineering is hard. They move slowly because of coordination.

You know the pattern. You're building a feature that touches three systems. You own one of them. The other two are managed by developers with their own priorities, their own sprints, their own backlogs. So you schedule a meeting. You explain what you need. They agree it's important but can't get to it for two weeks. You context-switch to something else. Two weeks later, that piece is done but now you need infrastructure changes, and that developer is mid-sprint on something else. Another meeting. Another wait.

The engineering work might take two weeks. The coordination adds months.

I think of it as the coordination tax. Every cross-functional project pays it. And for most enterprise teams, it's the single biggest reason features take as long as they do. Not complexity. Not technical debt. Coordination.

I'm in the middle of a project right now that would normally be drowning in it.

The System

I work at a B2B company that sells professional development content to organizations. We have hundreds of products in the catalog, and our clients serve them to their employees.

The sales cycle is long. Salespeople get on calls with prospects, ask about their needs, and figure out which products are the right fit. If we close the deal, it gets handed off to our client success team. Client success picks it up and naturally ends up covering a lot of the same ground, because the context from sales doesn't transfer cleanly. From there, either they manually set up the client's account, or they walk the client through our self-service tooling. Either way, there's a lot of repeated work and manual setup before the client is live.

I'm building a system to automate most of this.

We already use Gong to record sales calls, and those call summaries are stored in Salesforce. Every time a new call comes in, a Salesforce Flow identifies all the calls linked to that opportunity and sends them to an AWS Lambda function through an API Gateway. The Lambda function is a Python service that calls out to an LLM. It has access to our complete product catalog and metadata. The model analyzes the call summaries against the catalog, understands what the prospect actually needs, and generates a set of product recommendations. Those recommendations get written back to a custom object in Salesforce.

Every new call refines the recommendations. The system gets smarter about what the client needs as the sales process progresses. When the opportunity is marked closed-won, the recommendations are finalized.

From there, Salesforce syncs with our Rails monolith. The platform automatically creates a new organization for the client, builds a product package from the finalized recommendations, and grants access. Client success no longer needs to start from scratch. Instead of rebuilding context from the sales cycle and manually assembling everything, there's a tailored setup already waiting. The client will still customize things over time, but they're up and running on day one.

That's the system.

The Wall

I'm a full-stack developer. I work primarily in Rails and React with TypeScript. Python isn't even my main language, but I've been deploying production Python to Lambda functions for the past year with the help of AI coding tools. I can build the sync logic. I can handle the front-end work.

But this project also requires creating custom objects and triggers in Salesforce. I've never worked in Salesforce before. And it requires Terraform to stand up the infrastructure: a Docker container for the Lambda function, the API Gateway, all the AWS plumbing. I've worked with Docker, I've deployed Lambda functions, I've used API Gateway. But I've never written the Terraform to provision it all.

Normally, this means coordination. I'd ask our Salesforce developer to create the custom objects I need. He'd say yes, but he's in the middle of his own work. We'd need to loop in leadership to prioritize it. Same story with our platform engineer for the Terraform work. Meetings. Slack threads. Prioritization discussions. Context-switching on both sides.

I know what I need built. I understand the requirements completely. But I can't move forward because the work lives in someone else's domain.

This is the coordination tax.

What Changed

The Salesforce developer couldn't get to my work yet. So I got access to a Salesforce sandbox, opened Claude Code, and did it myself.

The work wasn't conceptually hard. I needed a custom object to store product recommendations. I understood the data model. I understood what fields it needed and how it related to the opportunity. I just hadn't worked in Salesforce's environment before. The barrier wasn't competence. It was familiarity. AI closed that gap in an afternoon.

Now I'm doing the same with the Terraform. I understand the infrastructure I need. I've worked with every component individually. I just haven't written the Terraform to wire it all together. Same pattern: I know what I need, I understand the architecture, and AI helps me work productively in a tool I haven't used before.

I'm not claiming to be a Salesforce developer or a Terraform expert. But adding a custom object to Salesforce isn't rocket science. Writing Terraform for a Lambda function behind an API Gateway isn't building Kubernetes clusters. These are straightforward tasks in unfamiliar tools. The kind of tasks that would normally require weeks of coordination but only hours of actual work.

The Bigger Picture

The coordination tax is one of the main reasons enterprise software moves slowly. When a single developer can work across stack boundaries, you don't just save that developer's time. You eliminate the meetings. You eliminate the prioritization discussions. You eliminate the context-switching for everyone involved. The Salesforce developer stays focused on his priorities. The platform engineer stays focused on hers. And the feature ships anyway.

This favors a certain kind of developer. Generalists who think in systems. Developers who are product-minded, who understand the full pipeline from sales call to user experience, and who are willing to step into unfamiliar territory. AI doesn't make you an expert in every tool. But it makes you productive enough in unfamiliar tools that you can stop waiting and start building.

The bottleneck in most organizations was never the engineering. It was the coordination. And that bottleneck is starting to disappear.

Why Functional Programming Is the Most Important Skill for the AI Era

John Costa — Tue, 24 Feb 2026 14:46:29 GMT

Boris Cherny, the creator of Claude Code at Anthropic, recently appeared on Lenny's Podcast and said something that stopped me in my tracks. When asked whether coding skills still matter, he was unequivocal: coding is "solved." Claude Code writes 100% of his code now. He doesn't miss the manual work, and he doesn't care if those skills atrophy.

I don't either. And I think Boris and I arrived at that conclusion through remarkably similar paths.

Boris studied economics, not computer science. He dropped out to start startups at 18. He got into coding because he wanted to build things, not because he loved the act of writing code itself. I came from philosophy. My first lines of code were JavaScript macros in Photoshop, automating the tedious parts of product photography. Neither of us set out to become software engineers. We set out to solve problems, and code was the tool we reached for.

Along the way, we both fell in love with functional programming. Boris discovered it after a motorcycle accident broke both his arms. He needed languages with fewer keystrokes, which led him from CoffeeScript to Haskell to Scala to TypeScript. He calls Functional Programming in Scala the most important technical book of his career. For me, the book that changed everything was Domain Modeling Made Functional by Scott Wlaschin. I came from an object-oriented background and genuinely didn't know another paradigm existed. When functional programming finally clicked, it fundamentally rewired how I think about building software.

Here's why that matters right now: the specific mental models that functional programming teaches you are exactly the skills you need to build effectively with AI. Not because AI writes functional code. It mostly doesn't. But because directing AI is itself a functional act. There are three reasons why.

1. Strong Types Communicate Intent

Boris says something in his Peterman Pod interview that I think is quietly profound: "I think in types when I code. The type signatures are more important than the code itself."

This is the first pillar. When you design a system by writing your types first, you're building a contract. You're defining what's possible, what's impossible, and what the boundaries of behavior look like. A well-designed type system is a set of constraints that eliminates entire categories of invalid states before a single line of implementation is written.

AI responds incredibly well to this. When you hand an AI a codebase with strong, expressive types, including well-defined state machines expressed through those types, you're giving it clear guardrails. The types communicate your intent in a way that's unambiguous. The AI doesn't have to guess what you want because the type system already constrains the universe of valid outcomes.

Think about it from the AI's perspective. If it's generating code in a loosely typed or untyped environment, it has enormous degrees of freedom. That means enormous potential to produce something that technically works but doesn't match your intent. Strong types collapse that space. They're a forcing function toward correctness.

This is why "thinking in types" isn't just a nice engineering practice anymore. It's becoming a prerequisite for effective AI-directed development. When you set up your type system and your state machines first, you're not just designing your software. You're designing the constraints that will guide the AI toward the right implementation.

2. Declarative Thinking Describes Outcomes, Not Instructions

The second reason functional programming prepares you for the AI era is more fundamental: functional programming is primarily declarative.

Consider SQL. When you write a SQL query, you don't tell the database engine how to search through the data. You don't specify which index to use, in what order to traverse the rows, or how to join the tables internally. You describe the outcome you want. Give me all records where this condition is true, grouped by this field, sorted by that one. The database engine figures out how to get there.

The underlying system is free to change and improve. The database might be rewritten tomorrow. None of that breaks your query, because your query never specified the how. It only specified the what.

Functional programming works the same way. When you compose pure functions, when you map and filter and reduce, you're describing transformations, not step-by-step procedures. You're saying "take this data and produce this shape" rather than "first do this, then check that, then loop through these."

Now apply this to how we work with AI. When you prompt an AI tool to build something, you are, at its core, performing a declarative act. You're describing the outcome you want. You're telling the AI what you need, and letting it figure out how to get there.

If you come from an imperative, object-oriented background, your instinct is to think in step-by-step instructions. You want to tell the computer exactly what to do and in what order. That instinct works against you when directing AI, because the AI might know a better way to reach the outcome. And as models improve, the how gets better. But only if you've left room for it. When you over-specify the implementation, you're constraining the AI in ways that will become increasingly counterproductive as the tools evolve.

Boris makes this point explicitly: build for the model six months from now, not the model you have today. Declarative specs are how you do that. The what you need doesn't change. The AI's ability to figure out the how only gets better.

Functional programmers have been training this muscle for years. We already think in terms of inputs, transformations, and outputs. We already describe outcomes rather than procedures. That's exactly the skill that AI-directed development demands.

3. Domain Modeling Is the Real Skill

The third piece is the one I think is most underappreciated, and it's the one that ties the whole argument together.

Domain Modeling Made Functional taught me that code should model the domain so faithfully that a domain expert, the person the software is actually built for, should be able to read the code and understand what it means. The code and the business logic shouldn't speak different languages. Your types should map to real-world concepts. Your functions should describe real-world operations. If the domain expert says "a customer places an order" and your code says processTransaction(entityRef, ctx), something has gone wrong.

This principle has always been good engineering practice. But now it's becoming essential for a different reason: English is becoming the primary programming language.

When you're working with AI, you're describing what you want to build in natural language. You're explaining the problem, the constraints, the user's needs, the desired behavior. You're doing domain modeling. In English. If you've spent years training yourself to think about software from the domain expert's perspective, to understand the problem space deeply and express it clearly, you're already fluent in the most important language of AI-directed development.

The engineers who struggle with AI tools are often the ones who think about code in terms of code. In terms of patterns, abstractions, and implementation details. The engineers who thrive are the ones who think about code in terms of the problem it solves. They can articulate what the software should do because they deeply understand why it exists and who it's for.

Boris talks about this instinct on the Lenny's Podcast episode. He describes "latent demand" as the most important principle in product. You can't get people to do something they don't already want to do. You find the intent they already have and build around it. That's product thinking, but it's also domain modeling. It's understanding the problem before you write a single line of code.

The Punchline

Types define the boundaries, declarative thinking describes the outcomes, and domain modeling ensures you're solving the right problem. None of these are about writing code. They're about thinking clearly, and then letting the AI do the writing.

Boris and I both came to engineering from liberal arts backgrounds. We both fell in love with functional programming. We both arrived at the same conclusion: the act of coding is being commoditized. But the skills that functional programming taught us are becoming more valuable, not less.

If you're an engineer wondering which skills to invest in right now, my advice is counterintuitive: don't practice writing code faster. Practice thinking about problems more clearly. The engineers who will thrive in the AI era aren't the ones who can write the most code. They're the ones who can think the most clearly about what needs to be built, and then let the machines do the building.

Shout out to Ryan Bell for introducing me to functional programming. It changed everything.

Resources

Lenny's Podcast: "Head of Claude Code: What happens after coding is solved | Boris Cherny" (Feb 19, 2026)
The Peterman Pod: "Boris Cherny (Creator of Claude Code) On How His Career Grew" (Dec 15, 2025)
Functional Programming in Scala by Paul Chiusano and Runar Bjarnason
Programming TypeScript by Boris Cherny
Domain Modeling Made Functional by Scott Wlaschin

How to Have a Career in 2026

John Costa — Sat, 14 Feb 2026 23:26:25 GMT

I've been evangelizing AI across our technology team for a while now. Recently, it's started to spill over into the rest of the organization. Colleagues and friends outside of engineering are asking me how they can use AI in their daily work, how they can automate parts of their workflow, and how they can do things faster and better.

Here's the thing. When I get these questions, all I'm doing is feeding them back into Claude to help me generate responses. I don't have deep knowledge of sales, finance, or creative workflows. But Claude does. That's the whole point.

So instead of trying to give you function-specific advice, I'm going to teach you the process. This is a simple step-by-step guide for getting started with AI and figuring out how to leverage it in your job. This is exactly what I would do if I was starting from scratch.

Step 1: Invest in Compute

You need access to a capable AI model. For me, that means Claude from Anthropic. Here's what I'd do:

Go to claude.ai and create an account
Download Claude Desktop

Once you have Claude Desktop, set up Connectors. This is critical. Connectors let Claude access, search, and take actions inside the tools you already use for work. Anthropic has a directory of 50+ integrations, including Slack, Gmail, Google Drive, Notion, Asana, Figma, Canva, and many more. If your workflow lives in Google Suite, Slack, or any common productivity tool, there's likely a connector for it.

This is what makes Claude more than a chatbot. With connectors, it can read your documents, search your messages, pull data from your project management tools, and actually do things inside those tools on your behalf. All without leaving the Claude window.

Step 2: Let Claude Set Itself Up

I'm not going to get into the exact specifics of how to configure everything. You know why? Because you're going to ask Claude how to do it. Not only can you ask Claude how to set things up, you're going to ask Claude to set things up for you.

This is an important habit to build early. Sometimes the AI will give you instructions for doing something manually when it could just do it for you. Ask it to try. It will sometimes say that it can't do something. Ask it to make sure, because it tends to undersell what it's actually capable of. You'd be surprised.

Step 3: Explain Your Job in Extreme Detail

This is the most important step, and most people skip it or rush through it.

You need to long-form explain exactly what you do in your job function. Go into extreme detail. There is no detail too trivial. Try not to make assumptions about what the AI already knows.

Cover everything:

What tools do you use?
What does your day-to-day look like?
What are your recurring tasks?
Who do you work with?
What are your inputs and outputs?
What takes you the most time?
What's tedious and repetitive?
What requires the most thought and judgment?

You don't need to know what you want to automate yet. This step is purely about giving the AI a complete picture of your work. Tell it to ask you clarifying questions about anything that's unclear. Keep going until it has a thorough understanding of your role.

I can't emphasize this enough. The quality of everything that follows depends on how well you do this step.

Pro tip: Use a speech-to-text tool for this. I use Wispr Flow. It lets you input three to five times as much information as you could by typing. When you're trying to dump your entire job description and daily workflow into the AI, speaking is dramatically faster.

Step 4: Ask How to Leverage AI

Only once the AI has a solid understanding of your job function do you start asking:

How can I leverage AI to make my life easier?
Which of my processes could be automated?
How can I be more productive?
How can I increase the quality of my output?
How can I ship more value in my function?

It will give you ideas. Then you keep iterating and ideating. Ask follow-up questions. Go deeper on the ideas that interest you. There's nothing you can't at least try getting the AI to do.

When you're set up with Claude Desktop and the connectors you've configured, the AI has access to your file system, can search the internet, and can work directly inside your tools. It can do an incredible amount of work for you. It can figure out how to automate your workflows and how you can best leverage it.

Claude for the AI itself. It's the best model available right now.

Wispr Flow for speech-to-text. Speaking is the fastest way to get large amounts of context into the AI.

Obsidian for note-taking. It stores everything in Markdown format, which is easy for you to read and, more importantly, easy for the AI to search and read as well. If you're going to be working with AI regularly, keeping your notes in a format the AI can consume is a force multiplier.

Software Engineers Are the Canary in the Coal Mine

I want to be direct about something.

Software engineers are the early adopters of AI. We've been leveraging these tools the most because AI development has been focused on automating software development first. It's highly technical, and we're the ones using AI to build AI. We've had a head start.

But here's the reality. If you have a desk job and you think your job is safe from AI, you're wrong. Your job is likely less technical than software engineering. Once AI gets specifically targeted at your function, it can be automated faster than you think. I'm not trying to be provocative. That's just the truth.

Matt Shumer's recent essay "Something Big Is Happening" puts it plainly: "If your job happens on a screen, AI is coming for significant parts of it." He compares this moment to early February 2020, when most people dismissed warnings about COVID. The people paying attention acted early. The rest were caught off guard.

In order to stay ahead, you need to figure out how to increase your productivity and ship more value, better and faster. The steps above are how you start.

Resources

Something Big Is Happening by Matt Shumer. The viral essay on why AI is about to change everything for knowledge workers.
Opus 4.6, Codex 5.3, and the Post-Benchmark Era by Nathan Lambert. A technical comparison of the latest models from Anthropic and OpenAI.
Claude Desktop. Download and get started.
Wispr Flow. Speech-to-text that actually works.
Obsidian. Markdown-based note-taking.

What It Takes to Be a Software Engineer in 2026

John Costa — Tue, 27 Jan 2026 16:17:32 GMT

It's no longer enough to be an individual contributor who pulls cards off the board, completes tasks, raises PRs, and merges them. AI has made implementation easier. What once took days now takes hours.

This means implementation is becoming commoditized. To stay valuable, you need to focus on everything else. And you can't wait to be asked.

Here's what I'm doing.

1. Use and Evangelize AI

Be the person who's forward-thinking and actually using these tools. Show your tech team and the broader organization what's possible. Lead others in adopting AI to streamline work and bring more value to the business.

Being the AI guy wasn't in my job description. I just started using the tools, talking about them, and showing people what was possible. Recently I presented at a company all-hands about Agentic AI. The productivity gains and value we're shipping are starting to get noticed at the C-suite level.

You get hands-on experience with what works. And as your colleagues start adopting these tools, they see you as the expert. That makes you more valuable.

2. Become a Generalist

Specialization made sense when implementation was the bottleneck. If building anything took significant effort, you needed deep expertise just to be productive.

That's changed. I'm no longer just working on the core web app. I'm reaching into DevOps, writing Terraform. I'm looking at QA processes, evaluating whether Playwright could replace manual tools like Ghost Inspector.

The breadth of what a single engineer can contribute to has expanded. Take advantage of it.

3. Automate More

We've always automated the core application. But engineer time was expensive, so the cost-benefit analysis rarely justified going beyond that.

The value proposition is different now. It makes sense to apply engineering effort to places we used to ignore.

Start by automating your own workflow. My interface is Claude Code. I manage my Jira cards and GitHub PRs through it. I respond to PR comments, address QA feedback, and incorporate PM input in real time.

Here's what this enables: when a PM finds an issue after delivery, I can address it immediately instead of creating a follow-up card for later. The implementation cycle is short enough now that this makes sense. The result is more complete features shipping faster.

Then see what you can automate for your team. I automated our deploy process. It's now a simple /deploy command in Claude that finds the diff between main and production, posts to Slack showing the commits about to be released, and performs the deployment. The developer doesn't have to do any of it manually.

Another example: I saw QA asking in Slack to change the session timeout on staging. No interface for it, so they needed a developer to push a code change. Back and forth, waiting on availability, then again to revert it when done.

I built a feature that lets QA manage it themselves and posted to Slack asking if there was buy-in. There was. Now QA doesn't wait on anyone.

Finally, work cross-functionally. Start looking at other teams in the company and what you can automate for them. This is where the huge value comes in. Developers adding automation across the business, not just within the tech team.

4. Build More Complete Features

Robust error handling. Full test coverage. Rate limiting. Extensive documentation.

These things weren't always worth the investment. Our team didn't prioritize a ton of tests. We definitely didn't prioritize documentation because there wasn't time to write it properly, let alone maintain it. We were applying the Pareto principle. Get 80% of the way there, accept diminishing returns after that.

That calculus has changed. These are trivial to add now, so add them. Tests and documentation also serve as guardrails for AI during implementation. They help ensure what gets built actually matches the requirements.

5. Expand Your Product Knowledge

When you're not spending all your time on implementation, you have more cognitive bandwidth to focus on other parts of the software development life cycle. Use that bandwidth.

Stop relying on product managers to write detailed specs. Get comfortable operating with ambiguity. Focus on what the business really needs and how to ship the right value.

When you deeply understand the product domain, you make better judgment calls. You anticipate edge cases. You push back when requirements don't make sense. You become a partner in product development, not just an executor.

6. Manage More Work in Progress

Before AI, it was smart to limit yourself to one or two code changes at a time. Context switching was expensive.

Now I have more changes in flight at once than I ever could before. AI tooling holds context for each workstream and helps me switch between tasks without losing momentum.

When you're waiting on code review or blocked on a dependency, you can pivot to something else without losing your place. The throughput increase is real.

The Point

The engineers who thrive in 2026 won't be the ones who resist these tools or use them passively. They'll be the ones who rethink what it means to be an engineer when implementation is no longer the hard part.

You can't just be reactive, pulling cards off the board and completing them. Look at the business, find places to contribute.

These are the strategies I'm using. They're working.

AI Drives, You Direct

John Costa — Tue, 20 Jan 2026 01:14:32 GMT

I've tried contributing to open source before. It's a lot of work, and when your PRs aren't getting traction, it's easy to give up.

Building oss-autopilot

I've written before about how AI-assisted development feels like engineering management. Directing, reviewing, course correcting. That framing got me thinking: what if I could build tooling that handles the tedious parts of open source, and I just direct?

That's oss-autopilot.

It tracks all my open PRs and fetches comments so I know when someone's waiting on me. It keeps a history of my contributions, which repos I've had success with, which ones have ignored me. When I'm looking for something new to work on, it searches for issues but filters them through that history. It steers me toward repos with active maintainers who've actually merged my stuff before.

When a maintainer leaves feedback on one of my PRs, it reads the comment and the relevant code and drafts a response. I review it, tweak it if needed, and approve it before anything gets posted. My reputation stays in my hands.

The whole thing runs through Claude Code. I type /oss and it goes to work. Checks my PRs, pulls in new comments, finds opportunities, drafts responses. I review what it surfaced and make decisions. That's it.

What changed

In the past few weeks I've had six PRs merged. I do most of this while playing Fortnite in the evenings.

The difference isn't that I got better at coding. It's that I stopped guessing and losing track of things. The tool remembers what I'd forget. I just make the calls.

The agentic shift

Tools like Claude Code are agentic now. They can take initiative, not just respond. They can check on things, fetch data, draft responses, keep track of state. That's a real capability shift.

The interface has changed too. You chat with Claude Code in plain English. Tell it what you want, and it goes off and does the work. That's it. No complex commands, no context switching between tools. Just conversation.

But most people still use AI the old way. You ask, it answers. You lead, it follows. That works, but it leaves a lot on the table.

The opportunity is building tooling that actually uses the agentic capabilities. Give Claude enough context and access to drive things forward on its own. Then you direct instead of do.

That's what oss-autopilot is for me. Yours might look different. But the principle is the same: if your AI can take initiative, build tooling that lets it.

oss-autopilot is open source if you want to try it or build something similar.

Why AI-Assisted Development Feels Like Engineering Management

John Costa — Tue, 06 Jan 2026 15:13:37 GMT

In 2021, I was grinding to hit a deadline. It was stressful. Multiple developers were working on the same parts of the codebase simultaneously, which meant constant merge conflicts that needed constant management. We took on a lot of tech debt and cut features just to ship. The result was a product no one was really happy with. The business was moving fast with limited developer resources, and we were on to the next project before we could clean up the mess.

I was unhappy with the quality of the work we were putting out despite the extra time and effort required to ship. I've always leaned toward reliability and maintainability. I understand the need to balance these tradeoffs based on business needs, but shipping janky code fast has never sat well with me. I was getting burned out on feature work and looking to pivot.

Luckily, I had a great manager who wanted to help. I joined our SRE subgroup and stepped away from features. One of the responsibilities of our small group was to handle all code review for the entire development team.

I spent the next two years diving into code review. I enforced standards, mentored developers, wrote code snippets to demonstrate what I was looking for, and paired when needed. I kept PRs moving because I didn't want the team blocked.

I started mentoring junior developers more formally and later took over leading a team of offshore developers working across the tech stack.

At this point it had been a while since I was really shipping my own code. I focused on planning, architecture, and delegation. I still personally reviewed code from about 30 developers. I was promoted to Associate Director of Engineering.

Soon after, the company (like many others at the time) began shrinking. Now the development team is quite small and I'm back in an IC role.

But things are different.

The Skills Transfer Directly

I still do all the planning and high level architecture. But now instead of delegating to a team, I'm delegating to Claude.

The experience I gained focusing primarily on code review for years has become invaluable. I'm constantly reviewing AI generated code. There is a strong correlation between communicating requirements to junior and mid level developers and doing the same with an AI assistant. Clear requirements, well defined scope, and strong architectural guidance matter just as much.

By leveraging tools like git worktrees, I'm able to work on multiple tasks simultaneously. I'm not writing every line of code. I'm directing, reviewing, and course correcting. That's management.

The Point

Real AI-driven development is more akin to a tech lead, staff engineer, or engineering manager role than traditional IC work. You're planning. You're reviewing. You're making architectural decisions. You're unblocking.

I used to think stepping away from code was a detour. Now it feels like preparation.

AI-Assisted Home Automation in 2026

John Costa — Sat, 03 Jan 2026 19:10:08 GMT

Home automation is a lot of fun, but once it becomes a serious hobby, you start running into some challenging problems.

Device fragmentation, for one. Ring app for the alarm. Hue app for lights. Aqara app for the lock. SwitchBot app for curtains. My wife's not going to download a dozen apps on her phone to control the house.

Then there's automation complexity. Most smart home platforms make simple things easy and complex things impossible. "Turn on lights at sunset" is fine. "Turn on lights at sunset, but only if someone's home, unless the alarm is armed, and also dim them if the TV is on" requires either a PhD in YAML or a tangled mess of node-red flows.

So here's my setup for 2026. I'll run through the hardware and services, but the part I really want to focus on is this: not editing YAML files by hand, not clicking through the Home Assistant UI, but building a strong connection between Claude Code and Home Assistant so I can use natural language to build out my automations.

The Stack

Hardware

Beelink Mini PC (~$200) - Low power, always-on server [1]
Proxmox - Free virtualization layer [2]
Docker - Container runtime inside LXC containers [3]

Services

Everything runs in Docker containers:

Home Assistant - Central hub for all devices
Pi-hole - Network-wide ad blocking

This is expandable. Jellyfin for media, Frigate for NVR, whatever you need.

The Key Part: Claude Code + Home Assistant

Here's where it gets interesting. Claude Code is Anthropic's CLI tool that gives Claude direct access to your filesystem and terminal. Point it at your home automation documentation, and it becomes an intelligent assistant that knows your setup.

Documentation Folder

Keep a local folder with markdown files that document your setup. This is the key to making Claude Code useful. Include:

Architecture notes - How your system is set up, what containers run where, common commands
Project logs - What you've built, what broke, how you fixed it
Config copies - Local copies of your configuration.yaml for reference

This folder is context. When Claude Code reads these files, it stops being a generic assistant and becomes one that knows your specific setup.

Connection Method

Claude Code connects to Home Assistant through three methods, each serving a different purpose:

Method	What it's for
SSH	File operations, container management
REST API	State queries, service calls
WebSocket	Registry operations

In practice, SSH is the workhorse. Most of what we do is edit YAML, upload it, restart or reload. REST API is for quick checks and triggering actions. WebSocket is niche, only needed for certain registry operations like renaming entities.

Setup:

SSH config entry pointing to your Home Assistant server with key-based auth
Long-lived access token created in Home Assistant UI (Profile → Long-lived access tokens)
Token stored in an environment variable (HASS_TOKEN) so Claude Code can use it

How do you set this up? Ask Claude Code to do it. It can generate SSH keys, update your SSH config, and set up the environment variables.

Bonus: Pair this with an AI voice-to-text tool [4] and you can just talk to Claude Code. Describe what you want out loud, it types for you, Claude Code does the rest.

The Payoff

Instead of clicking through GUIs or memorizing YAML syntax, you describe what you want:

"Create an automation that dims the kitchen lights when the Apple TV starts playing after sunset"

Claude Code:

Reads your configuration to understand existing entities
Writes the YAML automation
Uploads it to the server
Validates the config
Reloads Home Assistant

When something breaks, you describe the problem:

"The Ring sensors show unavailable"

Claude Code investigates, finds that the ring-mqtt container stopped, restarts it, re-authenticates if needed, and documents the fix.

Real Examples

Apple Home as unified interface. Home Assistant exposes a HomeKit bridge. All devices, regardless of manufacturer, appear in Apple Home with Siri support. Claude Code configured the bridge, filtered which entities to expose, and set friendly names.

Conditional Ring alarm automation. "Lock all doors when the alarm is armed" sounds simple. Implementation requires knowing the entity IDs, the correct service calls, and handling both armed_home and armed_away states. Claude Code wrote it in seconds:

- alias: "Lock All Doors When Alarm Armed"
  trigger:
    - platform: state
      entity_id: alarm_control_panel.oakland_alarm
      to: "armed_home"
    - platform: state
      entity_id: alarm_control_panel.oakland_alarm
      to: "armed_away"
  action:
    - service: lock.lock
      target:
        entity_id:
          - lock.aqara_smart_lock_u100
          - lock.back_door
          - lock.side_gate

What This Changes

The old workflow: SSH into the container, open a YAML file in Nano, make your edits, save, reload. If your spacing is off, it breaks. Fix it, try again. None of this is hard, it's just time-consuming. When building an automation takes 20 minutes of fiddling, it's harder to justify the time.

Claude Code compresses that. I describe what I want, it writes the YAML, uploads it, validates the config, reloads the integration. The documentation folder gives it context about my specific setup, so it knows my entity names, my server IPs, how my containers are organized. That's the difference.

That's the stack for 2026.

[1] The hardware doesn't matter much. An old laptop, a Raspberry Pi, or any always-on computer will work. I went with a mini PC for the performance headroom, but Home Assistant runs fine on minimal hardware.

[2] Proxmox lets you run multiple isolated environments on one machine. I can run Home Assistant in one container, Pi-hole in another, and spin up test environments without affecting production. If something breaks, I can snapshot and restore. Overkill for some, but nice to have.

[3] Why containers instead of bare metal? I can run multiple services on the same hardware without them stepping on each other. Home Assistant, Pi-hole, a media server, whatever. Each runs in its own container with its own dependencies. Updates are clean, backups are simple, and if a container misbehaves, you restart it without touching anything else.

[4] Voice-to-text options: Wispr Flow and VoiceInk both work well. They transcribe in real-time as you speak, so you can dictate directly into the terminal.

Modeling React State as a Finite State Machine

John Costa — Tue, 30 Dec 2025 21:38:44 GMT

Back in 2022, I wrote Building a Traffic Light React App exploring how to model UI state as a finite state machine. The key insight was naming states by their business meaning rather than their visual representation, like PriorityStraight instead of red-light-green-arrow.

That post used class components and MobX. This post takes the same core ideas (state machines and domain-driven naming) and shows how to apply them in modern React with function components, hooks, and TypeScript discriminated unions.

The Problem: Boolean Soup

Here's a typical React component managing a blog post editor:

const [isLoading, setIsLoading] = useState(false);
const [isSaving, setIsSaving] = useState(false);
const [isPublishing, setIsPublishing] = useState(false);
const [hasError, setHasError] = useState(false);
const [errorMessage, setErrorMessage] = useState('');
const [showPublishModal, setShowPublishModal] = useState(false);
const [showDiscardModal, setShowDiscardModal] = useState(false);
const [showSuccessMessage, setShowSuccessMessage] = useState(false);
const [isDirty, setIsDirty] = useState(false);

Nine useState calls. Nine independent booleans.

Here's the problem: With 9 booleans, there are 2⁹ = 512 possible combinations. But how many are actually valid UI states? Maybe 9 or 10.

That means ~97% of possible states are invalid. Can both modals be open? Can we be saving AND publishing? The type system allows it. Your UI doesn't. This gap is where bugs live.

The Solution: One State to Rule Them All

What are the actual states our editor can be in?

type EditorState =
  | { kind: 'editing'; draft: PostData; original: PostData }
  | { kind: 'saving-draft'; draft: PostData; original: PostData }
  | { kind: 'draft-saved'; draft: SavedPostData; original: PostData }
  | { kind: 'save-error'; draft: PostData; original: PostData; error: string }
  | { kind: 'confirming-publish'; draft: PostData; original: PostData }
  | { kind: 'publishing'; draft: PostData; original: PostData }
  | { kind: 'publish-error'; draft: PostData; original: PostData; error: string }
  | { kind: 'confirming-discard'; draft: PostData; original: PostData }
  | { kind: 'published'; post: SavedPostData };

Nine explicit states. One useState call:

const [state, setState] = useState({
  kind: 'editing',
  draft: { title: '', content: '' },
  original: { title: '', content: '' },
});

Why This Works

1. Impossible States Are Impossible

With boolean soup, nothing prevents showPublishModal and showDiscardModal from both being true. With a discriminated union, you can only be in ONE state at a time. The type system enforces it.

2. Domain-Driven State Names

Just like naming traffic light states PriorityStraight instead of red-light-green-arrow, we name our states by what they mean, not what they look like:

'saving-draft' instead of isSaving && !isPublishing && !showModal
'confirming-publish' instead of showPublishModal && !showDiscardModal

When you read state.kind === 'confirming-discard', you know exactly what's happening.

3. Each State Carries Its Context

Notice how save-error includes an error property, but editing doesn't? Each state carries only the data it needs. No stale errorMessage hanging around from a previous failed save.

4. Exhaustive Switch Statements

TypeScript's exhaustive checking ensures you handle every state:

const assertNever = (x: never): never => {
  throw new Error(`Unexpected state: ${x}`);
};

switch (state.kind) {
  case 'editing':
    // ...
  case 'saving-draft':
    // ...
  // If you forget a case, TypeScript errors!
  default:
    return assertNever(state);
}

Add a new state to the union? TypeScript shows errors everywhere you forgot to handle it.

5. Explicit Transitions

State transitions become clear, intentional functions:

const openPublishConfirmation = () => {
  switch (state.kind) {
    case 'editing':
    case 'draft-saved':
      setState({
        kind: 'confirming-publish',
        draft: state.draft,
        original: state.original,
      });
      return;

    // All other states: can't open publish modal
    case 'saving-draft':
    case 'publishing':
    // ...
      return;

    default:
      assertNever(state);
  }
};

You can see exactly which states allow transitioning to confirming-publish. This is your state machine, defined in code.

Try It Yourself

I've built a live demo with both approaches side-by-side:

View Demo →

View Source →

Interact with both editors. Notice how the "After" version displays its current state, so you always know exactly what's happening.

When To Use This Pattern

This pattern shines when:

You have 3+ boolean states that interact
States are mutually exclusive (one modal at a time, one async operation)
You're tracking loading/success/error cycles
You find yourself writing conditions like if (!isLoading && !isSaving && !hasError)

For a single isOpen boolean? Overkill. For anything resembling "boolean soup"? Worth it.

What About XState?

XState is excellent for complex state machines with visualization tools and formal semantics. But this pattern requires:

Zero dependencies
Just TypeScript + useState
Knowledge you already have

Start here. Graduate to XState when you need it.

Conclusion

The next time you reach for a fourth useState, stop and ask: what are the actual states my UI can be in?

Model those states explicitly. Name them by their domain meaning. Let TypeScript enforce that impossible states are impossible.

See also: Building a Traffic Light React App, the original exploration of domain-driven state naming.

Building a Traffic Light React App

John Costa — Mon, 26 Sep 2022 04:22:36 GMT

I set out to build an example that demonstrates how to manage state in a React TypeScript application using state machines. Traffic lights are commonly used as examples to describe how state machines work, so that seemed like a good place to start. The repo is available here.

Getting Started

The first step was to take a look at prior work, and a quick google search returned this article, which explains state machines pretty well and has good examples. I set out to recreate the author's design with React, TypeScript, and Functional Programming techniques. The states and transitions look like this (image from author article):

Something's Not Right

The code was working as implemented, but parts of the design were concerning:

The types may not represent a finite state machine for a traffic light. If the power is out, the light will be completely off. This is a valid, real world case that should be handled as a valid state.
Defining the power outage state with the current naming conventions is troublesome. Off does not fit with the color convention, and black doesn't seem appropriate in this context.
A newly installed light would be in a powered down or off state before being switched on for the first time. This state would be visually similar to the power outage state, but different in ability to transition to some other state. A light that is off because of a power outage would require fixing before being able to be powered back on, but a newly installed light would be able to go directly into a powered on state.

At this point, I wasn't quite sure how to refactor. Maybe building the app out a bit more would bring the problems to light. I decided to extend the state machine by adding a left turn indicator.

Coming Into Focus

Adding the left turn arrow meant adding a lot more states to the state machine. I started adding states like red-light-green-arrow and flashing-red-arrow-off. Yikes! Each new state exemplified the design issues. These names were getting long, and could continue to grow even longer as the complexity of the app develops. The fundamental flaw is that the states were defined by their visual representation in the user interface. This convention is not scalable and does not convey the business intent properly.

Solution

First, it helps to think about what a traffic light really represents. Most of us learn how to read a stop light at a young age and take it for granted, but suppose you had to explain it to a small child.

You might say,\ "Green means go." This implies movement and priority to proceed through an intersection. Yellow is a warning that movement will be prohibited soon. Red means stop, stay immobile, movement is not allowed. These words describe the domain logic of a traffic light. Domain Driven Design can be a powerful tool for building systems using a language that both the domain expert and developer can share.

Extension

Now we have a Warning state and a Prohibited state for yellow and red lights. When the left turn signal is green, we represent this as PriorityLeft instead of red-light-green-arrow. Green-light-flashing-yellow-arrow is PriorityStraight. Here is what our type for PriorityStraight looks like:

And here is how our finite state machine is represented in the types:

Conclusion

Modeling the domain in terms of the business function helps control complexity and aids maintainability/extendability as an application scales.