Building an internal agent: Evals to validate workflows

llm (24), agents (14), internal-agent (10)