Building Your First LLM Agent Application

When building a large language model (LLM) agent application, there are four key components you need: an agent core, a memory module, agent tools, and a…

When building a large language model (LLM) agent application, there are four key components you need: an agent core, a memory module, agent tools, and a planning module. Whether you are designing a question-answering agent, multi-modal agent, or swarm of agents, you can consider many implementation frameworks—from open-source to production-ready. For more information, see Introduction to LLM Agents.

For those experimenting with developing an LLM agent for the first time, this post provides the following:

An overview of the developer ecosystem, including available frameworks and recommended readings to get up-to-speed on LLM agents

A beginner-level tutorial for building your first LLM-powered agent

Developer ecosystem overview for agents

Most of you have probably read articles about LangChain or LLaMa-Index agents. Here are a few of the implementation frameworks available today:

LangChain Agents

LLaMaIndex Agents

HayStack Agents

AutoGen

AgentVerse

ChatDev

Generative Agents

So, which one do I recommend? The answer is, “It depends.”

Single-agent frameworks

There are several frameworks built by the community to further the LLM application development ecosystem, offering you an easy path to develop agents. Some examples of popular frameworks include LangChain, LlamaIndex, and Haystack. These frameworks provide a generic agent class, connectors, and features for memory modules, access to third-party tools, as well as data retrieval and ingestion mechanisms.

A choice of which framework to choose largely comes down to the specifics of your pipeline and your requirements. In cases where you must build complex agents that have a directed acyclic graph (DAG), like logical flow, or which have unique properties, these frameworks offer a good reference point for prompts and general architecture for your own custom implementation.

Multi-agent frameworks

You might ask, “What’s different in a multi-agent framework?” The short answer is a “world” class. To manage multiple agents, you must architect the world, or rather the environment in which they interact with each other, the user, and the tools in the environment. 

The challenge here is that for every application, the world will be different. What you need is a toolkit custom-made to build simulation environments and one that can manage world states and has generic classes for agents. You also need a communication protocol established for managing traffic amongst the agents. The choice of OSS frameworks depends on the type of application that you are building and the level of customization required.

Recommended reading list for building agents

There are plenty of resources and materials that you can use to stimulate your thinking around what is possible with agents, but the following resources are an excellent starting point to cover the overall ethos of agents:

AutoGPT: This GitHub project was one of the first true agents that was built to showcase the capabilities that agents can provide. Looking at the general architecture and the prompting techniques used in the project can be quite helpful.

Voyager: This project from NVIDIA Research touches upon the concept of self-improving agents that learn to use new tools or build tools without any external intervention.

OlaGPT: Conceptual frameworks for agents, like OlaGPT, are a great starting point to stimulate ideas on how to go beyond simple agents, which have the basic four modules.

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning: This paper first suggested the core mechanism for using tools with language models to execute complex tasks.

Generative Agents: Interactive Simulacra of Human Behavior: This was one of the first projects to build a true swarm of agents: a solution made up of multiple agents interacting with each other in a decentralized manner.

If you are looking for more reading material, I find the Awesome LLM-Powered Agent list to be useful. If you have specific queries, drop a comment on this post.

Tutorial: Build a question-answering agent

For this tutorial, you build a question-answering (QA) agent that can help you talk to your data.

To show that a fairly simple agent can tackle fairly hard challenges, you build an agent that can mine information from earnings calls. You can view the earnings call transcripts. Figure 1 shows the general structure of the earnings call so that you can understand the files used for this tutorial.

Figure 1. Conceptual breakdown of an earnings call

By the end of this post, the agent you build will answer complex and layered questions like the following:

How much did revenue grow between Q1 of 2024 and Q2 of 2024?

What were the key takeaways from Q2 of FY24?

Figure 2. Example question and answer for the agent you are building

 As described in part 1 of this series, there are four agent components:

Tools

Planning module

Memory

Agent core

Tools

To build an LLM agent, you need the following tools:

RAG pipeline: You can’t solve the talk-to-your-data problem without RAG. So, one of the tools that you need is a RAG pipeline. For the purposes of this discussion, assume that the RAG pipeline has 100% accuracy for basic or atomic questions.

Mathematical tool: You also require a mathematical tool for performing any type of analysis. To keep it simple for this post, I use an LLM to answer math questions, but tools like WolframAlpha are the ones that I recommend for production applications.

Planning module

With this LLM agent, you will be able to answer questions such as: “How much did the revenue grow between Q1 of 2024 and Q2 of 2024?” Fundamentally, these are three questions rolled into one:

What was the revenue in Q1?

What was the revenue in Q2?

And, what’s the difference between the two?

The answer is that you must build a question decomposition module:

decomp_template = “””GENERAL INSTRUCTIONS
You are a domain expert. Your task is to break down a complex question into simpler sub-parts.

USER QUESTION
{{user_question}}

ANSWER FORMAT
{“sub-questions”:[“”]}””

As you can see, the decomposition module is prompting the LLM to break the question down into less complex parts. Figure 3 shows what an answer looks like.

Figure 3. Planning module and prototyping decomposition

Memory

Next, you must build a memory module to keep track of all the questions being asked or just to keep a list of all the sub-questions and the answers for said questions.

class Ledger:
def __init__(self):
self.question_trace = []
self.answer_trace = []

You do this with a simple ledger made up of two lists: one to keep track of all the questions and one to keep track of all the answers. This helps the agent remember the questions it has answered and has yet to answer.

Evaluate the mental model

Before you build an agent core, evaluate what you have right now:

Tools to search and do mathematical calculations

A planner to break down the question

A memory module to keep track of questions asked.

At this point, you can tie these together to see if it works as a mental model (Figure 4). 

template = “””GENERAL INSTRUCTIONS
Your task is to answer questions. If you cannot answer the question, request a helper or use a tool. Fill with Nil where no tool or helper is required.

AVAILABLE TOOLS
– Search Tool
– Math Tool

AVAILABLE HELPERS
– Decomposition: Breaks Complex Questions down into simpler subparts

CONTEXTUAL INFORMATION

QUESTION
How much did the revenue grow between Q1 of 2024 and Q2 of 2024?

ANSWER FORMAT
{“Tool_Request”: “”, “Helper_Request “”}”””

Figure 4 shows the answer received for the LLM.

Figure 4. Putting all the modules together

You can see that the LLM requested the use of a search tool, which is a logical step as the answer may well be in the corpus. That said, you know that none of the transcripts contain the answer. In the next step (Figure 5), you provide the input from the RAG pipeline that the answer wasn’t available, so the agent then decides to decompose the question into simpler sub-parts.

Figure 5. Adding an answer to the sub-contextual question

With this exercise, you validated that the core mechanism of logic is sound. The LLM is selecting tools and helpers as and when required. 

Now, all that is left is to neatly wrap this in a Python function, which would look something like the following code example:

def agent_core(question):
answer_dict = prompt_core_llm(question, memory)
update_memory()
if answer_dict[tools]:
execute_tool()
update_memory()
if answer_dict[planner]:
questions = execute_planner()
update_memory()
if no_new_questions and no tool request:
return generate_final_answer(memory)

Agent core

You just saw the example of an agent core, so what’s left? Well, there is a bit more to an agent core than just stitching all the pieces together. You must define the mechanism by which the agent is supposed to execute its flow. There are essentially three major choices: 

Linear solver

Single-thread recursive solver

Multi-thread recursive solver

Linear solver

This is the type of execution that I discussed earlier. There is a single linear chain of solutions where the agent can use tools and do one level of planning. While this is a simple setup, true complex and nuanced questions often require layered thinking. 

Single-thread recursive solver

You can also build a recursive solver that constructs a tree of questions and answers till the original question is answered. This tree is solved in a depth-first traversal. The following code example shows the logic:

def Agent_Core(Question, Context):
Action = LLM(Context + Question)

if Action == “Decomposition”:
Sub Questions = LLM(Question)
Agent_Core(Sub Question, Context)

if Action == “Search Tool”:
Answer = RAG_Pipeline(Question)
Context = Context + Answer
Agent_Core(Question, Context)

if Action == “Gen Final Answer”:
return LLM(Context)

if Action == “”:

Multi-thread recursive solver

Instead of iteratively solving the tree, you can spin off parallel execution threads for each node on the tree. This method adds execution complexity but yields massive latency benefits as the LLM calls can be processed in parallel.

What’s next?

Congratulations! You are now armed with the knowledge you for building fairly complex agents! One next step is adapting the principles discussed earlier to your problems. 

To develop more tools for LLM agents, see the NVIDIA Deep Learning Institute page. To build, customize, and deploy an LLM for your use case, see the NVIDIA NeMo Framework.

Leave a Reply

Your email address will not be published.

Previous post Full details on Call of Duty: Modern Warfare III and Warzone Season 1, out December 6
Next post Homeworld 3 is Blackbird’s ambitious vision for a cinematic sci-fi RTS that was ‘utterly impossible to make’ 20 years ago