Dylan Etkin

April 17th, 2026

Agent Skills move too fast for git

Last month I was making a change to sx, our CLI. I updated a core flow, adding external catalogs as a source for sx add. Small change. Then came the testing.

I knew I was messing with a core flow and wanted to be sure I hadn't broken anything. I spent about forty-five minutes setting up an isolated environment. Spinning up Docker. Fighting with tmux. Getting a clean install state I could run through the TUI a few times. Forty-five minutes of my afternoon that produced zero code.

I complained in Slack. Don, my co-founder, said: "yeah, that's a pain, I made a skill for that."

Five minutes later I had a full test report. It had exercised every branch and the LLM had found a bug I missed in my manual pass.

The obvious response to this story is: great, so check the skill into the sx repo so the next person doesn't hit the same wall. And you'd be right. Most teams put their skills in a repo eventually. That and Claude plugins are the closest thing we have to a distribution mechanism for
Agent Skills.

But "eventually" is where this blog lives.

The ceremony is too expensive

Don's skill was in draft. Not because he's sloppy. Because that's how skills are.

A skill is usually half a page. You write one to solve an annoyance. A week later you hit a new edge case and you add a block. The week after, you notice the TUI library renders badly below 120 columns, so you hard-code the terminal size. A month later you add an OAuth handoff because that flow finally matters to you. The skill is alive. You change it almost every time you use it.

Git is designed for blessed artifacts. Commit, push, PR, review, merge, pull. That ceremony is worth it for a feature, a fix, a release. For a skill you're still iterating on, it is way too much. You'd be committing half-done work five times a day, or pushing once a week and leaving your
teammates several versions behind.

So the honest pattern on most teams is this: the skill stays in someone's personal .claude/skills/directory for weeks. Useful, evolving, not ready for ceremony. The author gets value from it immediately. Nobody else does, until somebody complains at the right moment.

That's not a failure of git. It is a mismatch between git's lifecycle and skills' lifecycle. Git is for things that ship. Skills, most of the time, aren't shipped — they're lived with.

The cross-repo problem is worse

Assume you push through the ceremony tax and get your skill into a repo. Which repo?

Our docker-interactive-testing skill tests sx, so it's easy. It belongs in the sx repo. But plenty of skills aren't repo-specific.

"Review this PR using our conventions." "Draft a changelog entry." "Check this migration is reversible." "Open a Jira ticket in the right format." None of these have a natural repo home. You can put them in a central team-skills repo and have every engineer clone it, and keep it pulled so their agent sees the latest. That's an extra checkout on every laptop that has to stay fresh. You can copy the skills into every repo and watch them drift. You can use git submodules and discover what git submodules are actually like in practice.

Everything about this is solvable in theory. In practice it's a tax most engineers decide isn't worth paying. The cross-cutting skills stay in personal directories too, for the same reason the drafts do — the friction of doing it right is higher than the friction of just copy-pasting when asked.

Then there's discovery

Say you've solved both of those. You have a slick central skills repo. Everyone is pulling it. Things are committed, versioned, and reviewed.

Now you need a skill. A teammate wrote one. You don't know its filename, or its folder, or whether it's in the central skills repo or still in a PR, or whether it's still in someone's local directory. You vaguely remember hearing about it in Slack three weeks ago.

Git gives you grep. It doesn't answer "what skills exist on our team that would help with what I'm trying to do right now, and which of them are relevant to this specific task my agent is about to work on?"

That's a different retrieval primitive. Skills will be retrieved by agents, not just humans, dozens of times a day, based on task context that wasn't anticipated when the skill was written. Grep isn't enough. You need something that knows what's available, what each skill is actually for, and
how to hand the right subset to the right agent at the right moment.

What Don's skill actually did

Here it is, for readers who want to see the work. Lightly trimmed.

---
name: docker-interactive-testing
description: Use this skill when manually testing sx CLI flows
that require interactive terminal input (TUI prompts, arrow keys,
confirmations). Uses Docker + tmux to simulate a real user session
with a PTY. Don't use for automated test suites or non-interactive
CLI testing.
--- 
# Docker Interactive Testing for sx

## Overview

Test sx in isolated Docker containers by driving a tmux session 
step-by-step. This simulates a real user with a PTY — arrow keys, 
Enter, TUI menus all work. You act as the user, interpreting output 
and deciding what to input next.

**When testing against a local Pulse/Sleuth backend**, read 
`local-backend-setup.md` in this skill directory first for container 
networking, credentials, and OAuth flow details.

## When to Use

- Testing onboarding flows (`sx init`)
- Verifying interactive TUI prompts work correctly
- Testing first-time user experience in a clean environment
- Reproducing bugs that require interactive input
- Testing `sx add`, `sx install`, scope configuration

The skill is about 150 lines. The value isn't in the lines. It's in the sharp edges it encodes:

You can't test a real TUI through a subprocess pipe. The escape sequences get eaten. You need a PTY, which is why tmux is in the picture.
The TUI library (bubbletea/huh) doesn't error on small terminals. It renders garbage. The skill uses -x 120 -y 40.
tmux capture-pane only shows the current screen. Long OAuth flows scrolled past, and the agent reported "nothing here" until Don added -S -100 for scrollback.
When sx init prints a device code, the skill hands off to Chrome DevTools MCP to complete the OAuth in a real browser. No Playwright test can do that.

Each of those is a thing Don had to discover and encode. The skill was still evolving when he shared it with me. That's why it wasn't in a repo yet — not sloppiness. He wasn't done learning from it.

What a team skills layer needs

I don't have the complete answer. I have a list of properties I'm sure it has to have.

Friction-free sharing. If it costs me more to make a skill available to my team than it cost me to write the skill, the skill stays local. Every time.

Cross-repo by default. A skill isn't beholden to any one repository. It should be usable from any codebase the team works in.

Discovery by task, not by filename. "I'm testing a TUI" should surface the testing skill. I shouldn't have to know what Don named it or where he put it.

Scoped to the agent. When an agent takes on a subtask it shouldn't get every skill on the team. It should get the ones relevant to what it is actually doing, or it wastes tokens and gets confused.

Versioned, but loosely. I don't want every tweak to be a PR. I do want to see when something changed, and by whom.

That's what we're building skills.new to be. Not a replacement for git. A layer alongside it, sized for the kinds of assets, small, fast-moving, cross-cutting that git doesn't serve well.

A question

What skills has your team written in the last month? Who on the team could give me a complete answer?

I've been asking engineering leaders this question for a while. So far nobody can. I'd like to know if the shape of my problem matches yours.

Facebook Twitter LinkedIn Mail

Without RBAC for Agent Skills and MCP, your entire organization basically has root access to your company

Product News

You Bought the AI Licenses. Why Is Only One Developer Getting 10x Results?