Building an internal agent: Adding support for Agent Skills

Published on December 26, 2025. llm (25), agents (15), internal-agent (10)

When Anthropic introduced Agent Skills, I was initially a bit skeptical of the problem they solved–can we just use prompts and tools?–but I’ve subsequently come to appreciate them, and have explicitly implemented skills in our internal agent framework. This post talks about the problem skills solves, how the engineering team at Imprint implemented them, how well they’ve worked for us, and where we might work with them next.

This is part of the Building an internal agent series.

What problem do Agent Skills solve?

Agent Skills are a series of techniques that solve two important workflow problems:

use progressive disclosure to more effectively utilize the constrained context windows, minimizing conflicting or unnecessary context in the context window
provide reusable snippets for solving recurring problems to avoid individual workflow-creators having to solve recurring problems like e.g. Slack formatting or dealing with large files

These problems initially seemed very insignificant when we started building out our internal workflows, but once the number of internal workflows reached into the dozens, both become difficult to manage. Without reusable snippets, I lost the leverage to improve all workflows at once, and without progressive disclosure the agents would get a vast amount of irrelevant content that could confuse them, particularly when it came to things like inconsistencies between Markdown and slack’s mrkdwn formatting language, both of which are important to different tools used by our workflows.

How we implemented Agent Skills

As a disclaimer, I recognize that it’s not necessary to implement agent skills, as you can integrate with e.g. Claude’s Agent Skills support for APIs. However, one of our design decisions is being largely platform agnostic, such that we can switch across model providers, and consequently we decided to implement skills within our framework.

With that out of the way, we started implementing by reviewing the Agent Skills documentation at agentskills.io, and cloning their Python reference implementation skills-ref into our repository to make it accessible to Claude Code.

The resulting implementation has these core features:

Skills are in skills/ repository, with each skill consisting of its own sub-directory with a SKILL.md

Each skill is a Markdown file with metadata along these lines:

---
name: pdf-processing
description: Extract text and tables...
metadata:
  author: example-org
  version: "1.0"
---

The list of available skills–including their description from metadata–is injected into the system prompt at the beginning of each workflow, and the load_skills tool is available to the agent to load the entire file into the context window.
Updated workflow configuration to optionally specify required, allowed, and prohibited skills to modify the list of exposed skills injected into the system prompt.
My guess is that requiring specific skills for a given workflow is a bit of an anti-pattern, “just let the agent decide!”, but it was trivial to implement and the sort of thing that I could imagine is useful in the future.
Used the Notion MCP to retrieve all the existing prompts in our prompt repository, identify existing implicit skills in the prompts we had created, write those initial skills, and identify which Notion prompts to edit to eliminate the now redundant sections of their prompts.

Then we shipped it into production.

How they’ve worked

Humans make mistakes all the time. For example, I’ve seen many dozens of JIRA tickets from humans that don’t explain the actual problem they are having. People are used to that, and when a human makes a mistake, they blame the human. However, when agents make a mistake, a surprising percentage of people view it as a fundamental limitation of agents as a category, rather than thinking that, “Oh, I should go update that prompt.”

Skills have been extremely helpful as the tool to continue refining down these edge cases where we’ve relied on implicit behavior because specifying the exact behavior was simply overwhelming. As one example, we ask that every Slack message end with a link to the prompt that drove the response. That always worked, but the details of the formatting would vary in an annoying, distracting way: sometimes it would be the equivalent of [title](link), sometimes link, sometimes [link](link). With skills, it is now (almost always) consistent, without anyone thinking to include those instructions in their workflow prompts.

Similarly, handling large files requires a series of different tools that benefit from In-Context Learning (aka ICL, which is a fancy term for including a handful of examples of correct and incorrect usage), which absolutely no one is going to add to their workflow prompt but is extremely effective at improving how the workflow uses those tools.

For something that I was initially deeply skeptical about, I now wish I had implemented skills much earlier.

Where we might go next

While our skills implementation is working well today, there are a few opportunities I’d like to take advantage of in the future:

Add a load_subskill skill to support files in skills/{skill}/* beyond the SKILL.md. So far, this hasn’t been a major blocker, but as some skills get more sophisticated, the ability to split varied use-cases into distinct files would improve our ability to use skills for progressive disclosure
One significant advantage that Anthropic has over us is their sandboxed Python interpreter, which allows skills to include entire Python scripts to be specified and run by tools. For example, a script for parsing PDFs might be included in a skill, which is extremely handy. We don’t currently have a sandboxed interpreter handy for our agents, but this could, in theory anyway, significantly cut down on the number of custom skills we need to implement.
At a minimum, it would do a much better job at operations that require reliable math versus relying on the LLM to do its best at performing math-y operations.

I think both of these are actually pretty straightforward to implement. The first is just a simple feature that Claude could implement in a few minutes. The latter feels annoying to implement, but could also be implemented in less than an hour by running a second lambda running Nodejs with Pyodide, and exposing access to that lambda as a tool. It’s just so inelegant for a Python process to call a Nodejs process to run sandboxed Python that I haven’t done it quite yet.