There's an npm-shaped hole in the AI tooling stack

I've had this same conversation with 60+ engineering teams in the last six months. A team adopts AI tooling. One developer figures out how to use it well, builds up a vault of skills, MCP configs, and slash commands that 10x their output. The rest of the team has whatever they can scavenge from a shared Notion doc.

The basics of sharing AI knowledge work fine. Check a CLAUDE.md into a repo and your teammates get it on clone. The limitations show up the moment you try anything more complex. Share a single skill across 30 repositories without copying it 30 times. Define a set of skills for a marketing or legal team that doesn't use repositories at all. Track which skills are getting invoked across multiple AI clients. Atlassian's engineering leadership has said publicly that there's "no effective way to share agent rules" past the basics. Rippling wrote a custom Go service to push agent configs across 800+ repos because nothing existed to do it. Every team building this past the trivial case is reinventing the same wheel.

sx is an open-source package manager for AI assets: skills, MCP server configs, slash commands, agents, hooks, and rule files, all treated as versioned packages. You define them once, push them to a vault, and install them where they belong. This post walks through the three design decisions in sx that I think matter most: the scope model, the interoperability layer, and the governance surface.

Scope

The default way teams share AI knowledge today is git. Check CLAUDE.md or .cursorrules into a repo, and everyone who clones gets the same context. That gives you exactly one scope: the repository. Everything in the repo applies to every user, every project, every agent that touches it. If you want a skill to apply only to the whole company, only to one team, only to one engineer trying a new pattern, or to an automated bot working as part of a team, git can't model any of that. You can layer Claude Code plugins or vendor marketplaces on top, which is a real step up over raw CLAUDE.md files. But each plugin is scoped to its own publishing repo, so teams end up shipping one plugin per team and duplicating the same skills (Go formatting, brand voice, the internal API conventions) across multiple plugins because plugins don't have a team scope. It's also still tied to one vendor's client. And if your team doesn't use repositories at all (marketing, legal, HR, sales, ops), there's no on-ramp to skills for them in the first place.

sx treats scope as part of the asset's installation, not a property you bolt on afterward. Four scopes that matter in practice, top-down:

Organization. Things that apply to everyone in the company. Brand colors. Voice and tone. The product naming convention. The legal-approved one-liner for what the company does. The "do not promise X in external copy" rule. Org-scoped assets get distributed to every user automatically, regardless of which team they're on. Update once, propagate everywhere.

Team. This is where most of the day-to-day value lives, and where the model diverges most sharply from git. A team is a collection of individuals and repositories. Scoping an asset at the team level lets you share it without checking it into every repo that uses it.

In engineering terms: all your Go repositories share one blessed Go code-formatting skill. All your Python repositories share a different one. The platform team has a set of skills about how to write a new service that applies across the five services they own. Update the skill in the vault once, and every Go repo, every Python repo, every team member's local setup picks it up. On my own team, the first PR a new hire opens follows our service-layer pattern without a single review comment about it. The agent gets the convention because the skill is there at the right scope. No PR storm across N repos to keep them in sync. Confluent's leadership has described the resulting variance in productivity across teams as a "power law distribution of effectiveness," and the absence of a team scope is exactly what produces that power law.

The angle that's harder to see if you only think about engineering: non-engineering teams use the same model. Your marketing team shares its writing skills — brand voice, copy style, the audience persona doc — via sx, and those skills land in the team's chatgpt.com and claude.ai sessions. Your legal team shares its contract-review skills. Your sales team shares its objection-handling playbook. The team is the unit of shared expertise, and the team's interface to AI doesn't have to be git.

Bots get assigned to teams. An automated bot is just an extra teammate. It picks up the team's skill set, operates within the team's permissions, and shows up in the team's audit log like any other member. Want a nightly maintenance bot that behaves like part of the platform team? Assign it there. Add a new skill to the team, the bot gets it on its next run. No separate config to maintain.

Repository. Roughly equivalent to checking the asset into the repo today. The asset lives with the code, version-controlled alongside it, scoped to anything inside that repository. Useful for the things that genuinely belong to a single codebase and would be noise anywhere else: "this service uses an unusual auth model," "this repo has its own deploy script," "the tests here use an in-house framework documented inline." One thing worth flagging at repo scope: code commit cadence is usually the wrong cadence for skill updates. More on that tension here.

Within a single repo, sx also supports per-path scoping in monorepos. Different directories get different asset sets, so the agent's loaded context shifts as you cd around. Frontend skills in the frontend package, infra skills in the Terraform directory, and the two don't pollute each other.

Individual. Personal scope is the iteration loop. You try a new skill yourself. You refine it across a few weeks of actual use. When it's ready, you promote it to team or org. Until then, it doesn't pollute anyone else's context. The point of individual scope isn't to be a permanent home for the asset. It's to let you experiment without making a public commitment.

Individual scope is also where personal customization lives. You write differently than your teammates. You have an idiomatic way of structuring tests. You like the agent to over-comment a particular kind of operation. None of that needs to become team policy. Individual scope is where it stays.

The reason any of this matters in practice is context. Every skill you install and every MCP server you wire in goes into the agent's context. More skills mean more context, more tools for the agent to consider, more ways for it to get distracted from the task you actually want done. An agent with 50 skills loaded performs worse than the same agent with the 5 it needs for the job. Scoping is the difference between "the agent has my team's patterns ready" and "the agent has to wade through fifty unrelated patterns to find the one that applies." A secondary but real benefit: right-sized scope also keeps the surface area small. An accountant doesn't need the GitHub MCP. A developer doesn't need the Stripe MCP. Less of what you don't need is its own kind of safety.

A few concrete commands:

# Install everything declared at the current scope (reads the lockfile)
sx install

# Add a skill from the team vault
sx add platform-team/service-layer-pattern

# Add a skill from the community directory at skills.sh
sx add anthropics/skills/frontend-design

# Switch profile (same machine, different identity)
sx profile use experimental

The lockfile is the contract. Run sx install on a fresh laptop, in a CI runner, or inside a bot's sandbox, and you get the same set of assets at the same versions every time. Git gives you the files. sx gives you scope, identity, pinning, and reproducibility on top.

Interoperability

Every AI client expects a different format. Claude Code reads skills as folders containing SKILL.md. Cursor reads .cursorrules in the project root and has its own commands directory. Copilot uses Spaces and a different rule mechanism. claude.ai uses Projects. chatgpt.com uses its own Custom GPT and context conventions. Cline, Codex, Gemini (across CLI, VS Code, JetBrains, and Android Studio surfaces), and Kiro each have their own conventions on top.

Every team I talk to is using two or more of these tools at the same time, and once you include non-engineering, the count goes up sharply. Marketing lives in chatgpt.com. Engineering is in Claude Code and Cursor. Legal is in claude.ai. The data team is in Gemini. A skill defined directly in any one of these is invisible to all the others. Airtable's engineering team has described their MCP setup as "manual and tedious" despite high daily AI usage, and that manual reconciliation is exactly the work the translation layer is meant to eliminate.

sx defines an asset once in a client-agnostic shape and translates to the right place at install time. The prompt content you wrote stays exactly as you wrote it. sx wraps it with metadata (name, version, dependencies, type) and handles knowing how to install it into each client in the right way. One sx install lays down the same brand-voice skill in your marketing team's chatgpt.com workspace and your engineering team's Claude Code, with the right naming and the right format for each.

Supported clients:

Client	Surface
Claude Code	Developer (terminal)
claude.ai	Non-developer (web)
chatgpt.com	Non-developer (web)
Cursor	Developer (IDE)
GitHub Copilot	Developer (IDE / GitHub)
Cline	Developer (IDE plugin)
Codex	Developer (CLI)
Gemini (CLI / VS Code / JetBrains / Android Studio)	Developer
Kiro	Developer (IDE)

The translation layer is what turns "define once" from a slogan into something that survives a team switching tools, a new hire showing up with their preferred client, or a non-engineering team adopting AI at all. If a vendor changes their rules format next quarter, sx updates the translation; your assets don't move.

AI assets need the same kind of abstraction npm and pip give code packages, and there isn't a vendor whose incentives are aligned with delivering it neutrally. I think the vendor-specific marketplaces (Cursor's rules directory, Copilot Spaces, Anthropic's plugin registry) are a dead end for any team running more than one client, and most teams I talk to are running three or four. Multi-client translation has to live at the package-manager layer, and the package manager has to be neutral about which client wins.

Governance

The third design decision is governance, and it has two parts. Knowing what's being used. Knowing who used it. Git gives you neither.

Usage tracking. You'd think the skills that get invoked most would be the well-named, well-documented, well-promoted ones. That's not what happens. What happens is that a third of the skills get used constantly, a third occasionally, and a third never get invoked at all. Sometimes the unused third is the one a senior engineer spent two days perfecting. Sometimes the most-used skill is a five-line prompt someone wrote on a Wednesday afternoon.

You can't fix this without data. Most teams running an internal AI marketplace today are flying blind on which assets actually get used. I was confident our team's service-layer skill was the most-used asset in our vault. Usage data has a way of cutting that kind of confidence down fast. sx tracks per-asset invocation counts, per-user adoption, version usage, and which assets are in active use across the fleet, regardless of vault type (local, git, or hosted). Why it matters:

Dead-skill detection. A skill published six months ago with zero invocations in the last 90 days is a candidate for retirement, not the next big push. Cutting dead assets reduces context bloat for the ones that work.
Pre-publish validation by usage. When two teams propose competing skills for the same job, ship both and let invocation counts decide which one becomes canonical. A/B testing for prompts.
Adoption-gap visibility. You assume your service-layer skill is used by everyone. The data says it's used by 4 of 23 engineers. That's an enablement problem, not a tool problem, and you can't see it without instrumentation.
Cost attribution. Token spend is a real budget line now. Tying spend to invoked-skill is how you learn which assets earn their context cost.

Audit log. The other half of governance is knowing where your knowledge is going and who's reading it. An audit log records who installed which asset, who invoked it, when, from which client, and, for bots, under which identity. Git tells you who committed a file; it doesn't tell you who read it inside an AI session, or whether that session was a teammate's Claude Code or a bot's nightly run.

This matters in two directions. Security first. An internal pricing playbook, a customer escalation handbook, a list of sensitive product flags should not leave the team that owns them, and the audit log tells you when an asset ended up in a context it shouldn't have. Attribution second. If your marketing team's brand-voice skill is being heavily used by sales, that's a signal worth investing in cross-team enablement, and you only see it if the data exists.

Most teams have nothing here right now. They assume their AI assets are working because no one is complaining, and that no one is misusing them because there's nothing to look at. The teams that get governance data first will make better decisions about everything downstream.

Architecture

┌──────────────────────────────────────────────────────────┐
│   Vault (where assets live; usage + audit on all types)  │
│  • Local path        one developer                       │
│  • Git repository    team-shared, PR-gated               │
│  • skills.new        org-shared, RBAC, centralized UI    │
└──────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────┐
│              sx CLI (resolve, lock, scope)               │
│  • Reads asset metadata + dependency graph               │
│  • Writes a lockfile per project                         │
│  • Applies scope (org / team / repo / path / individual) │
│  • Switches profiles per identity (human or bot)         │
│  • Emits invocation + access events                      │
└──────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
 Developer surfaces   Non-developer        Bots / agents
 (Claude Code,        surfaces (claude.ai, (any client,
  Cursor, Copilot,    chatgpt.com)         scoped + audited)
  Cline, Codex,
  Gemini, Kiro)

Everything sx does sits between the vault and the client. It reads the vault, resolves the dependency graph against your current scope, writes a lockfile so installs are reproducible, then installs each asset in the format the target client expects. Governance data comes from a small set of hooks sx installs into each supported client, which let the CLI intercept invocations and access events as they happen and feed them back as usage and audit data.

What's open, what's hosted

sx is Apache-2.0, written in Go, available via brew install sleuth-io/tap/sx or the install script in the README. Local and git vaults are the core workflow and you can run the full package-management primitives without a skills.new account. The one place the CLI touches skills.new even for local and git vaults is as a thin MCP gateway, which is how those assets reach web clients like chatgpt.com and claude.ai that can't read your filesystem directly. The hosted skills.new vault itself is our commercial offering, adding RBAC, audit trails, org-wide usage analytics, and a UI for discovery and approval workflows.

If sx becomes the primitive teams standardize on, I'd rather have it be an open standard with multiple vault implementations than a SaaS we sit on top of. The commercial layer is the governance and analytics backend, not the package manager itself.

What I want to learn

Here's what I want opinions on:

Is the scope model the right shape? Org → team → repo → path → individual is what's emerged from the teams we've talked to so far. Sub-team and environment scopes (staging vs. prod assets) are likely candidates we haven't fully modeled yet.
What governance signals do teams actually want? Invocation counts and a basic audit log are the obvious starting point. Token cost per invocation, latency, success and failure signals from the agent, data-egress events. Still an open question.
Do you agree that we need a more nuanced system for skills and MCP, or are git and the existing plugin systems sufficient? If we do need a new layer, how do we bring along the non-technical teams who don't live in repositories?

I keep coming back to this: AI assets are about to matter more than the code that calls them. The teams that figure out how to manage, distribute and govern those assets first are going to pull ahead fast, and the ones still passing CLAUDE.md files around in Slack are going to spend the next year wondering why their AI rollout stalled.

If you're working on this inside your team, the most useful issue you can file is the one that says "this is the wrong shape, here's why." If sx is right, great. If it's wrong, I want to know. Drop a comment, file an issue, or star the repo.

github.com/sleuth-io/sx