Building an internal agent: Code-driven vs LLM-driven workflows
When I started this project, I knew deep in my heart that we could get an LLM plus tool-usage to solve arbitrarily complex workflows. I still believe this is possible, but I’m no longer convinced this is actually a good solution. Some problems are just vastly simpler, cheaper, and faster to solve with software. This post talks about our approach to supporting both code and LLM-driven workflows, and why we decided it was necessary.
This is part of the Building an internal agent series.
Why determinism matters
When I joined Imprint, we already had a channel where folks would share pull requests for review. It wasn’t required to add pull requests to that channel, but it was often the fastest way to get someone to review it, particularly for cross-team pull requests.
I often start my day by skimming for pull requests that need a review in that channel,
and quickly realized that often a pull request would get reviewed and merged without
someone adding the :merged: reacji onto the chat. This felt inefficient, but also
extraordinarily minor, and not the kind of thing I want to complain about.
Instead, I pondered how I could solve it without requiring additional human labor.
So, I added an LLM-powered workflow to solve this. The prompt was straightforward:
- Get the last 10 messages in the Slack channel
- For each one, if there was exactly one Github pull request URL, extract that URL
- Use the Github MCP to check the status of each of those URLs
- Add the
:merged:reacji to messages where the associated pull request was merged or closed
This worked so well! So, so well. Except, ahh, except that it sometimes decided to add :merged:
to pull requests that weren’t merged. Then no one would look at those pull requests.
So, it worked in concept–so much smart tool usage!–but in practice it actually didn’t
solve the problem I was trying to solve: erroneous additions of the reacji meant
folks couldn’t evaluate whether to look at a given pull request in the channel based on the reacji’s presence.
(As an aside, some people really don’t like the term reacji.
Don’t complain to me about it, this is what Slack calls them.)
How we implemented support for code-driven workflows
Our LLM-driven workflows are orchestrated by a software handler. That handler works something like:
- Trigger comes in, and the handler selects which configuration corresponds with the trigger
- Handler uses that configuration and trigger to pull the associated prompt, load the approved tools, and generate the available list of virtual files (e.g. files attached to a Jira issue or Slack message)
- Handler sends the prompt and available tools to an LLM, then coordinates tool calls based on the LLM’s response, including e.g. making virtual files available to tools. The handler also has termination conditions where it prevents excessive tool usage, and so on
- Eventually the LLM will stop recommending tools, and the final response from the LLM will be used or discarded depending on the configuration (e.g. configuration can determine whether the final response is sent to Slack)
We updated our configuration to allow running in one of two configurations:
# this is default behavior if omitted
coordinator: llm
# this is code-driven workflow
coordinator: script
coordinator_script: scripts/pr_merged.py
When the coordinator is set to script, then instead of using the handler to determine which tools are called,
custom Python is used. That Python code has access to the same tools, trigger data, and virtual files
as the LLM-handling code. It can use the subagent tool to invoke an LLM where useful
(and that subagent can have full access to tools as well), but LLM control only occurs when explicitly desired.
This means that these scripts–which are being written and checked in by our software engineers, going through code review and so on–have the same permission and capabilities as the LLM, although given it’s just code, any given commit could also introduce a new dependency, etc.
How’s it working? / Next steps?
Altogether, this has worked very well for complex workflows. I would describe it as a “solution of frequent resort”, where we use code-driven workflows as a progressive enhancement for workflows where LLM prompts and tools aren’t reliable or quick enough. We still start all workflows using the LLM, which works for many cases. When we do rewrite, Claude Code can almost always rewrite the prompt into the code workflow in one-shot.
Even as models get more powerful, relying on them narrowly in cases where we truly need intelligence, rather than for iterative workflows, seems like a long-term addition to our toolkit.