Building an internal agent: Adding support for Agent Skills
When Anthropic introduced Agent Skills, I was initially a bit skeptical of the problem they solved–can we just use prompts and tools?–but I’ve subsequently come to appreciate them, and have explicitly implemented skills in our internal agent framework. This post talks about the problem skills solves, how the engineering team at Imprint implemented them, how well they’ve worked for us, and where we might work with them next.
This is part of the Building an internal agent series.
What problem do Agent Skills solve?
Agent Skills are a series of techniques that solve three important workflow problems:
- use progressive disclosure to more effectively utilize the constrained context windows
- minimize conflicting or unnecessary context in the context window
- provide reusable snippets for solving recurring problems to avoid individual workflow-creators having to solve recurring problems like e.g. Slack formatting or dealing with large files
All three of these problems initially seemed very insignificant when we started building out our internal workflows,
but once the number of internal workflows reached into the dozens, both become difficult to manage.
Without reusable snippets, I lost the leverage to improve all workflows at once, and without progressive disclosure
the agents would get a vast amount of irrelevant content that could confuse them, particularly when it came to things
like inconsistencies between Markdown and slack’s mrkdwn formatting language, both of which are important to different
tools used by our workflows.
How we implemented Agent Skills
As a disclaimer, I recognize that it’s not necessary to implement agent skills, as you can integrate with e.g. Claude’s Agent Skills support for APIs. However, one of our design decisions is being largely platform agnostic, such that we can switch across model providers, and consequently we decided to implement skills within our framework.
With that out of the way, we started implementing by reviewing the Agent Skills documentation at agentskills.io, and cloning their Python reference implementation skills-ref into our repository to make it accessible to Claude Code.
The resulting implementation has these core features:
Skills are in
skills/repository, with each skill consisting of its own sub-directory with aSKILL.mdEach skill is a Markdown file with metadata along these lines:
--- name: pdf-processing description: Extract text and tables... metadata: author: example-org version: "1.0" ---The list of available skills–including their description from metadata–is injected into the system prompt at the beginning of each workflow, and the
load_skillstool is available to the agent to load the entire file into the context window.Updated workflow configuration to optionally specify required, allowed, and prohibited skills to modify the list of exposed skills injected into the system prompt.
My guess is that requiring specific skills for a given workflow is a bit of an anti-pattern, “just let the agent decide!”, but it was trivial to implement and the sort of thing that I could imagine is useful in the future.
Used the Notion MCP to retrieve all the existing prompts in our prompt repository, identify existing implicit skills in the prompts we had created, write those initial skills, and identify which Notion prompts to edit to eliminate the now redundant sections of their prompts.
Then we shipped it into production.
How they’ve worked
Humans make mistakes all the time. For example, I’ve seen many dozens of JIRA tickets from humans that don’t explain the actual problem they are having. People are used to that, and when a human makes a mistake, they blame the human. However, when agents make a mistake, a surprising percentage of people view it as a fundamental limitation of agents as a category, rather than thinking that, “Oh, I should go update that prompt.”
Skills have been extremely helpful as the tool to continue refining down these edge cases
where we’ve relied on implicit behavior because specifying the exact behavior was simply overwhelming.
As one example, we ask that every Slack message end with a link to the prompt that drove the
response. That always worked, but the details of the formatting would vary in an annoying, distracting
way: sometimes it would be the equivalent of [title](link), sometimes link, sometimes [link](link).
With skills, it is now (almost always) consistent, without anyone thinking to include those instructions
in their workflow prompts.
Similarly, handling large files requires a series of different tools that benefit from In-Context Learning (aka ICL, which is a fancy term for including a handful of examples of correct and incorrect usage), which absolutely no one is going to add to their workflow prompt but is extremely effective at improving how the workflow uses those tools.
For something that I was initially deeply skeptical about, I now wish I had implemented skills much earlier.
Where we might go next
While our skills implementation is working well today, there are a few opportunities I’d like to take advantage of in the future:
Add a
load_subskillskill to support files inskills/{skill}/*beyond theSKILL.md. So far, this hasn’t been a major blocker, but as some skills get more sophisticated, the ability to split varied use-cases into distinct files would improve our ability to use skills for progressive disclosureOne significant advantage that Anthropic has over us is their sandboxed Python interpreter, which allows skills to include entire Python scripts to be specified and run by tools. For example, a script for parsing PDFs might be included in a skill, which is extremely handy. We don’t currently have a sandboxed interpreter handy for our agents, but this could, in theory anyway, significantly cut down on the number of custom skills we need to implement.
At a minimum, it would do a much better job at operations that require reliable math versus relying on the LLM to do its best at performing math-y operations.
I think both of these are actually pretty straightforward to implement. The first is just a simple feature that Claude could implement in a few minutes. The latter feels annoying to implement, but could also be implemented in less than an hour by running a second lambda running Nodejs with Pyodide, and exposing access to that lambda as a tool. It’s just so inelegant for a Python process to call a Nodejs process to run sandboxed Python that I haven’t done it quite yet.