Stacey on Software

Agile

Diversion #1: Agents

June 11, 2023

The idea of an Agent is something that has developed over the course of applying Large Language Models (LLMs) to problem solving. You can think of an Agent as a structured approach to problem solving, applied using a loop that feeds the output of the LLM back into the agent until a certain end condition is met.

This is an implementation of Chain of Thought, with the structure of the thoughts guided by a specific syntax. Let’s take a look at a structure used by SuperAGI, which is very similar to the structures used in other projects like AutoGPT, BabyAGI, etc.

SuperAGI is a formation of docker images that uses modern techniques for defining and running a series of coordinated servers (docker-compose), and puts a simple web UI on top of defining and running agents, and comes with a handful of pre-built “tools” for the agents to use. As of 2023-06-11 it’s fairly limited, but is very easy for folks with rudimentary experience to launch a real, local, multi-agent environment.

For this post, I need to let you know how I extend the idea of “working memory” to LLMs

  • “working memory” - this is traditionally the amount of cognitive capacity we have, the number of things we can hold in our mind when we’re working on a problem without having to go back and look them up again. I think of this as a parallel to the number of tokens a LLM can work with. Eg about 2,000 for many open and early models, 4,000 for ChatGPT-3.5 and about 8,000 for ChatGPT-4. Because of the token limitation, we must be careful how large our prompts become, and how much content we have left for the generated response.

It is also useful now to expand how we think of a “prompt.” I’ve been using this word fairly imprecisely until now, and we need to get a bit finer in how we define it.

During a chat conversation, each message provided to or from the LLM has a role. This role could be

  • system - a message provided by the developer to control the model’s behaviour and style of responses
  • user - a message provided by a user, the content of the chat dialog provided by the user
  • assistant - a message provided by the model, the content of the chat dialog provided by the LLM

When calling a LLM API for a chat style interaction, you provide a list of messages.

These are the messages SuperAGI sends as its first request into a new Agent-driven chat:

Message 1 (system):

You are SuperAGI an AI assistant to solve complex problems. Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications. If you have completed all your tasks or reached end state, make sure to use the “Finish” tool.

GOALS:

  1. write an application to help me coordinate finances for the coaches in my organization (the only thing I provided to the agent)

CONSTRAINTS:

  1. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
  2. Ensure the command and args are as per current plan and reasoning
  3. Exclusively use the commands listed in double quotes e.g. “command name”

TOOLS:

  1. ThinkingTool: Intelligent problem-solving assistant that comprehends tasks, identifies key variables, and makes efficient decisions, all while providing detailed, self-driven reasoning for its choices., args json schema: {“task_description”: {“title”: “Task Description”, “description”: “Task description which needs reasoning.”, “type”: “string”}}
  2. Read File: Reads the file content in a specified location, args json schema: {“file_name”: {“title”: “File Name”, “description”: “Path of the file to read”, “type”: “string”}}
  3. Write File: Writes text to a file, args json schema: {“file_name”: {“title”: “File Name”, “description”: “Name of the file to write. Only include the file name. Don’t include path.”, “type”: “string”}, “content”: {“title”: “Content”, “description”: “File content to write”, “type”: “string”}}
  4. GoogleSearch: A tool for performing a Google search and extracting snippets and webpages.Input should be a search query., args json schema: {“query”: {“title”: “Query”, “description”: “The search query for Google search.”, “type”: “string”}}
  5. finish: use this to signal that you have finished all your objectives, args: “response”: “final response to let people know you have finished your objectives”

PERFORMANCE EVALUATION:

  1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
  2. Constructively self-criticize your big-picture behavior constantly.
  3. Reflect on past decisions and strategies to refine your approach.
  4. Every tool has a cost, so be smart and efficient.
  5. Aim to complete tasks in the least number of steps.

I should only respond in JSON format as described below. Response Format: { “thoughts”: { “text”: “thought”, “reasoning”: “reasoning”, “plan”: ”- short bulleted\n- list that conveys\n- long-term plan”, “criticism”: “constructive self-criticism”, “speak”: “thoughts summary to say to user” }, “tool”: { “name”: “tool name/task name”, “description”: “tool or task description”, “args”: { “arg name”: “value” } } }

Ensure the response can be parsed by Python json.loads.

Message 2 (system)

The current time and date is Sun Jun 11 12:29:45 2023

Message 3 (user)

Determine which next tool to use, and respond using the format specified above:

Note in the first system message the list of “TOOLS” - you select a series of tools in SuperAGI that you will make available to the agent, and that informs this list. Each tool has a name and description and a schema for what parameters it needs to run. Two tools, ThinkingTool and finish are internal tools that SuperAGI adds itself.

Note also in the first system message the “PERFORMANCE EVALUATION” section, that helps it fill out the thoughts area of the JSON data structure with which the prompt directs it to respond.

This would be an example of how the LLM might respond, note that it uses the JSON format that it was directed to use in the first system message.

{
"thoughts": {
"text": "I need to start working on the financial coordination application.",
"reasoning": "The first goal is to write an application to help coordinate finances for the coaches in the organization.",
"plan": "- Define the application requirements\n- Identify key variables\n- Develop the application",
"criticism": "I should make sure to keep the application simple and avoid legal complications.",
"speak": "I will begin working on the financial coordination application for the coaches in your organization."
},
"tool": {
"name": "ThinkingTool",
"description": "Intelligent problem-solving assistant that comprehends tasks, identifies key variables, and makes efficient decisions.",
"args": {
"task_description": "Write an application to help coordinate finances for the coaches in the organization."
}
}
}

Notice how the first thing the LLM wants to do is to employ the ThinkingTool. By its own description, the ThinkingTool is an “Intelligent problem-solving assistant that comprehends tasks, identifies key variables, and makes efficient decisions, all while providing detailed, self-driven reasoning for its choices.”

The ThinkingTool is a tool that sends another system message into the Agent chat context, and collects the LLM’s next generated response.

To give you a flavour of what this looks like, this is the prompt used by the ThinkingTool:

Given the following overall objective

Objective:

  1. write an application to help me coordinate finances for the coaches in my organization (the original goal I provided to the agent)

and the following task, Write an application to help coordinate finances for the coaches in the organization. (provided in the JSON of the LLM’s first response, via tool/args/task_description)

Perform the task by understanding the problem, extracting variables, and being smart and efficient. Provide a descriptive response, make decisions yourself when confronted with choices and provide reasoning for ideas / decisions.

And the response of the ThinkingTool’s Zero-Shot interaction with a new LLM session:

To develop an application that helps coordinate finances for the coaches in your organization, we first need to understand the requirements and the key functionalities that the application should have. Here’s a breakdown of the steps to create the application:

  1. Identify the key stakeholders: In this case, the main stakeholders are the coaches, the finance team, and potentially the management.

  2. Determine the main features and functionalities: Based on the objective, the application should have the following features:

    a. Expense tracking: Allow coaches to input their expenses, such as travel, equipment, and other costs. This can be done through a simple form or by integrating with popular expense tracking tools.

    b. Budget management: Set budgets for each coach or team and track their spending against the budget. This will help the finance team monitor and control costs.

    c. Reporting: Generate reports on expenses, budgets, and other financial metrics for the management to review and make informed decisions.

    d. Approval workflows: Implement approval processes for certain expenses or budget changes to ensure proper oversight and control.

    e. Notifications and alerts: Notify the relevant stakeholders when expenses are submitted, budgets are exceeded, or other important financial events occur.

  3. Choose the technology stack: Decide on the programming languages, frameworks, and tools that will be used to develop the application. This decision will depend on factors such as the existing technology stack in your organization, the expertise of your development team, and the desired performance and scalability of the application.

  4. Design the user interface (UI) and user experience (UX): Create wireframes and mockups of the application’s interface, focusing on usability and simplicity. Ensure that the design is consistent with your organization’s branding and style guidelines.

  5. Develop the application: With the requirements, features, and design in place, your development team can start building the application. This will involve setting up the database, implementing the backend logic, and creating the frontend interface.

  6. Test the application: Conduct thorough testing of the application, including functional, performance, and security testing. This will help identify and fix any issues before the application is deployed.

  7. Deploy the application: Once the application has been tested and is ready for use, deploy it to your organization’s servers or a cloud hosting provider.

  8. Train the users: Provide training and support to the coaches, finance team, and other stakeholders on how to use the application effectively.

  9. Monitor and maintain the application: Continuously monitor the application’s performance and address any issues that arise. Regularly update the application with new features and improvements based on user feedback and changing requirements.

By following these steps, you can create an efficient and user-friendly application that helps coordinate finances for the coaches in your organization. This will ultimately lead to better financial management and decision-making within the organization.

My, GPT-4 is chatty, isn’t it! This response from the Zero-Shot request is added to the Agent chat as a system message, and provided to the LLM so that it can have its next thought.

You’ll notice that the last tool provided in the list of TOOLS was called the Finish tool. When the response from the LLM indicates to use the Finish tool, the Agent considers the LLMs work complete, and thus breaks out of the loop.

You can see that the Agent might spin in this loop for quite some time. The traditional (can we call this traditional given how young this field is?!) approach is to limit this to a maximum number of iterations of the loop, so if the Agent never resolves to feeling that it’s finished, it doesn’t spin forever.

Next Up

Now that we’ve covered some of how Agents work, we can start digging into how we can expand the way we use prompting to generate code, and some tools we might provide that would be useful to an LLM in coding out a system.


Welcome to my personal blog. Writing that I've done is collected here, some is technical, some is business oriented, some is trans related.