Conversational Memory and Tool Use AI Agents
By: Samuel Chan · October 16, 2024
Generative AI Series
Create RAG systems and AI agents with Sectors Financial API, LangChain and state-of-the-art LLM models -- capable of producing fact-based financial analysis and financial-specific reasoning. **Continually updated** to keep up with the latest major versions of the tools and libraries used in the series.
Generative AI Series: Table of Contents
Generative AI for Finance
Tool-Use Retrieval Augmented Generation (RAG)
Structured Output from AIs
Tool-use ReAct Agents w/ Streaming
Conversational Memory AI Agents
This article is part 5 of the Generative AI for Finance series, and is written using LangChain 0.3.2.
For best results, it is recommended to consume the series in order, starting from chapter 1.
For continuity purposes, I will point out the key differences between the current version (LangChain 0.3.2, using runnables
) and the older implementations featuring LLMChain
and ConversationChain
.
Conversational AI with Memory
Oftentimes, we design our AI agents to be conversational, allowing them to interact with users in a more human-like manner. Part 5 of the Generative AI series is on building a conversational AI agent with memory capabilities, which can “remember” past interactions in the conversation and use that information to generate more contextually relevant responses.
The essential components of a memory system requires:
- Memory Storage: A mechanism to store and retrieve information.
- Memory Update: A mechanism to update the memory based on new information.
- Memory Retrieval: A mechanism to retrieve information from memory.
Instead of operating in a stateless manner, we will be constructing a system where the prompt is augmented with memory information before being passed to the model, and subsequently updating this memory with the agent’s response. In other words, this chain will be interacting with this memory system twice in any given conversation turn, once to perform (3) Memory Retrieval and once to perform (2) Memory Update.
Observe where the memory system is integrated into the agent’s workflow. Also note how the chain:
- Augments the user input with memory information before passing it to the model. This happens after receiving the user input but before the agent performs any processing.
- Updates the memory with the agent’s response after the model has generated a response, typically before returning the response to the user. This adds information to the memory storage that future conversation turns can refer to.
Underlying this memory system can range from simple key-value stores to more complex storage systems that offer persistence and authentication features.
In the past, memory-backed AI agents were typically implemented with either a LLMChain
or ConversationChain
, and
the simplicity of these classes made it easy to showcase the memory system. I will first demonstrate how that is
done before moving on to the newer, more flexible RunnableWithMessageHistory
class as recommended in the
latest version of LangChain (0.3.2).
Memory in LLMChain
and ConversationChain
This sub-section demonstrates the memory system in LangChain’s LLMChain
and ConversationChain
classes.
As of LangChain 0.3.0 (mid-October ‘24), these two will yield a LangChainDeprecationWarning
warning.
- The class
LLMChain
was deprecated in LangChain 0.1.17 and will be removed in 1.0. Use :meth:~RunnableSequence, e.g.,
prompt | llm“ instead. - The class
ConversationChain
was deprecated in LangChain 0.2.7 and will be removed in 1.0. Use :meth:~RunnableWithMessageHistory: https://python.langchain.com/v0.2/api_reference/core/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html
instead.
The rest of this article outside of this sub-chapter will be using the newest, recommended classes (as of October ‘24).
There are two key components in the code above, irrespective of whichever class you choose to use:
-
The
PromptTemplate
class, which is used to define the template for the prompt. How we name the variables in the template is important, as it will be used to match the keys in the memory system. -
The
ConversationBufferMemory
class, which is a simple memory system that stores the conversation history in a buffer. It requires amemory_key
to match the key in the prompt template.
Since we have {history}
in the prompt template, the memory system will store the conversation history under the key history
, which will be used to augment the prompt before passing it to the model.
If desired, one can also manipulate the memory system by adding user or AI messages to the conversation history through the chat_memory
attribute.
Once you have the ConversationChain
or LLMChain
set up, you can interact with it as you would with any other chain. The memory system will automatically update the conversation history with each turn,
and the model will be able to access this history in subsequent turns.
Notice the answers to each prompt are contextually relevant to the conversation history. The AI agent could not have understood the question about the “central bank of that country” or “with whom does this country compete with” without these information injected into the prompt from the conversation history.
Conversational Agents through RunnableWithMessageHistory
If you’re going through the Generative AI series on your own, you’d probably be reading this article
closer to the end of 2024 or later. In that case, you should be using the RunnableWithMessageHistory
class
along with the LCEL (LangChain Expression Language) to build your conversational AI agents. ReAct agents and LCEL
are topics covered in Chapter 4: Tool-Use ReAct Agents of the series.
The key changes with LangChain 0.3.2 and above are the use of RunnableWithMessageHistory
to construct a
runnable
— consistent with what we’ve learned in previous chapters of this series — and a more explictly
way of handling message history through InMemoryChatMessageHistory
. RunnableWithMessageHistory
wraps around a runnable (like
the ones we’ve seen before) but with the added capability of working with chat message history, thus allowing this
runnable to read and update the message history in a conversation.
Unlike other runnables, RunnableWithMessageHistory
must always be invoked with a config
that contains the parameters
for the chat message history.
Let’s start with the imports and set up a runnable chain much like you’ve done in the previous chapters.
Again, pay special attention to the variable names in the prompt template. We have decided to call it history
and question
, but your use-case may vary.
The big picture idea isn’t much different from the previous examples, where we are creating these variables to allow the memory system to augment the prompt before passing it to the model.
Set aside syntactic differences, the key idea is to inject, or “copy-paste”, into the prompt past conversational rounds so the prompt is contextually informative.
In a production environment, you might use a persistent implementation of key-value store for this message history, like
RedisChatMessageHistory
or MongoDBChatMessageHistory
.
View the full list of integration packages and providers on LangChain Providers.
With our runnable chain
set up, let’s now:
- Create an in-memory dictionary to store the message history based on a unique session id
- Wrap our
chain
withRunnableWithMessageHistory
to handle the message history through matching the variables in the prompt template.
The get_session_history_by_id
function retrieves the message history based on a unique session id.
If the session_id
is not found in the store, it means the user has not interacted with the agent before, and
so a new InMemoryChatMessageHistory
object is created and stored in the dictionary.
Runnable with Message History in Action
With all of that in place, let us now interact with our with_memory
runnable to see how it performs in a conversation.
Because supertype
is not present in store
, a new InMemoryChatMessageHistory
object is created on our memory store under the supertype
key.
Subsequent interactions with the agent using this session_id
will refer to this key (pointing to an object containing the conversation history).
Just as how we initialized store
as an empty dictionary, print(store)
will show you that the structure of this dictionary is as follows:
And since our store
has been updated with this new key, let’s also print out the content of this new key-value pair:
So far, it’s looking good! The agent has provided a detailed, on-point response to the user’s question. Now, let’s test the agent’s memory by asking a follow-up question that relies on the information provided in the previous response.
Different session_id
for different Conversations
It does look like our AI agent handled that follow-up question well!
By matching the session_id
, it was able to identify which companies were being referred to and inject the right
context from our memory store.
Now that our conversation has grown a little longer, let’s see if it still maintains context in the next question.
It seems that the AI agent performs admirably in this conversation, providing contextually relevant responses based on the conversation history. Just to test its ability to order these message histories sequentially, I’ve asked it for the second question as well:
For the most part, the AI agent’s ability to store and retrieve these message histories, and the quality of this ability, will be
dependent on the way we set up the memory system as well as the LLM model itself. If you have been following along with your
own LLM model, you might notice a difference in the quality of responses compared to the examples above.
It should come as no surprise that when we try to access a different session_id
, the agent will not be able to retrieve the conversation history
from the store
dictionary and will promptly create a new InMemoryChatMessageHistory
object for that session_id
, as implemented in the get_session_history_by_id
function.
Advanced configuration for message histories tracking
Recall that this is our current implementation carried over from the previous sections:
This function in fact, also accepts an optional parameter, history_factory_config
that expects a list of ConfigurableFieldSpec
objects.
Notice that I’ve also changed the get_session_history
to this new function that I have yet to create, so let’s go ahead
and create it:
I have also slightly modified my prompt
for this example, even though it’s not necessary for the
history_factory_config
to work.
Now, to invoke our runnable with the new history_factory_config
, your config
will have to match
the specifications constructed with the ConfigurableFieldSpec
objects.
We are going to pretend that we have some internal database that provide us with the stocks owned by the respective
users, and mock them up for now. Here’s my implementation of _get_stocks_of_user
and _get_user_settings_preferences
:
And now I can initiate a chat, first using user Sam (id 001
), and then user Anonymous (id 002
).
Since I did not specify any conversation_id
, it will default to 1
. This is verified
by printing the store
dictionary after the first chat:
Now with the Anonymous user, we are going to issue a conversation_id
of 1
explicitly, but due to
the implementations of get_session_history_by_uid_and_convoid
, it will still create a new InMemoryChatMessageHistory
object.
Let’s verify that asking the AI for the name (user 1 introduces himself as Sam) will not work for user 2.
Notice that even though the conversation_id
is the same, our function is implemented in such a way that the
AI agent will treat it as a separate conversation.
Whenever Sam is ready to continue the conversation, he can do so with the same conversation_id
of 1
:
SQLChatMessageHistory
Memory implementations vary from simple in-memory dictionaries to more complex, persistent storage systems. The exact implementation will depend on your specific use case, requirements, as well as the library you choose.
To demonstrate a more persistent memory system, I will show you how to use SQLChatMessageHistory
with SQLite.
Start with installing the langchain-community
package, which contains the SQLChatMessageHistory
class. As always, I
recommend doing this in a virtual environment.
Now, import the SQLChatMessageHistory
class and modify your get_session_history_by_uid_and_convoid
function to use it,
swapping out InMemoryChatMessageHistory
for SQLChatMessageHistory
.
With SQLite, is a database of that name is not found, it will be created for you. There is no separate setup required for the creation of this database.
The rest of your code should remain the same, but now when we call chat(user_id, input)
for the
first time, it will create a new memory.db
file in the same directory as your script.
Exploring the database, we can see a table named message_store
being created for us, identical to
the following schema:
Executing SELECT * FROM message_store
will show you the conversation history stored in the database:
New to SQL and SQLite?
Once you’ve run the code above, a new database is created on your behalf by SQLite. This database exists on your local machine and can be accessed using a SQLite client, or directly queried using SQL commands.
You can learn more about SQL in the SQL Essentials guide I wrote, but it is beyond the scope of this article.
Adding memory to prebuilt ReAct
agents
We’ve learned about the prebuilt ReAct
agents in the previous chapter. Adding in-memory
capabilities to these agents is actually fairly straightforward, so let’s see a bare minimum example of how to do this.
The key difference here is the addition of the MemorySaver
class, which LangChain describes as an in-memory checkpoint saver.
Just like the store={}
dictionary we used in the previous examples, this class also
stores its checkpoints using a defaultdict
in memory.
I’ve mentioned that create_react_agent
really requires two arguments: the llm
model and the tools
list, but
accept additional keyword arguments. If you want to, you can also pass in a state_modifier
that acts almost like
a prompt (we’ve also seen this earlier):
The rest of the code remains the same from earlier chapters.
I will leave it as an exercise for you to implement the other tools using the @tool
decorator, but
this serves as a sufficient example to demonstrate the use of a tool-using (“function calling”) ReAct
agent with memory capabilities.
A quick glance at Sectors report on Adaro Energy Indonesia Tbk (ADRO) will confirm
that the information provided by the AI agent is accurate, and it was able to retrieve this information from the
get_company_overview
tool.
In fact, if we so desire, we can also break down each intermediary message contained in the out['messages']
list for inspection:
- The first message is a
HumanMessage
object, which is the user’s input (e.g. “Give me an overview of ADRO”). - The second message is an
AIMessage
, which reads the user’s input and decides on the right tools to call - The third message is a
ToolMessage
, which is the tool call itself (e.g.get_company_overview
) - The fourth message is another
AIMessage
, which is the AI agent’s response to the user’s input, in plain human language
Sequentially, the messages are as follows:
Given how often we want to be interacting with the AI agent, I’ve wrapped the invocation logic into chat()
and we will now proceed to ask a few follow up questions to see the memory in action.
To both of these follow-up questions, the AI agent was able to access and draw from its memory to provide the contextually relevant and correct responses.
Challenge
Using what you’ve learned in this chapter, try to implement an end-to-end financial agent that is fun to use and can provide you with the latest stock information, company overviews, and even more.
Here are some ideas to get you started:
- Implement 3 or more tools, each leveraging an external API to retrieve financial data
- Implement a CLI interface for your agent, or a simple web interface using any tools of your choice
- Implement a memory system that can store and retrieve conversation histories, and use it to provide contextually relevant responses
Here is an example conversation of a passing submission for this challenge: