Add Memory to LLMs
Make the LLMs remember previous interactions
LLMs do not hold an internal state, and many applications require tracking previous interactions with the LLM as part of the interface (e.g. chatbots). To this end, you can add memory to an LLM node under the Stack AI tool by clicking on the gear icon of the LLM node.
Some quick facts:
-
All the LLM memory is encrypted end-to-end in the Stack AI database.
-
This data can be self-hosted under the Stack AI enterprise plan.
-
The LLM memory is user-dependent and an instance of the LLM memory.
-
Once the deployed as an API, you can specify the user_id for the LLM memory for each user (see Deployer Guide ).
-
By default, the “Sliding Window Input” memory is selected when a new LLM node is added to the flow.
We offer three types of memory modalities:
-
Sliding Window
-
Stores all LLM prompts and completions.
-
The modality may consume many tokens as the LLM prompts can often occupy thousands of tokens.
-
Loads a window of the previous prompts and completions as part of the LLM conversation memory, up to the number of messages in the window.
-
In non-chat models (e.g. Davinci), the memory is added as part of the prompt as a list of messages at the end of the prompt.
-
-
Sliding Window Input
-
Stores all LLM completions and one LLM input parameter (e.g. in-0). The modality is more token efficient and aligned with many applications (e.g. when you only need to store the user message from in-0)
-
Loads a window of the previous inputs and completions as part of the LLM conversation memory, up-to the number of messages in the window. In non-chat models (e.g. davinci-003-text), the memory is added as part of the prompt as a list of messages at the end.
-
-
VectorDB
- Stores all of the inputs and outputs to the LLM in a Vector Database and retrieves the most relevant messages to use as LLM memory.
- This is especially useful if you expect some of the information to be needed at a later time but not in a sequential manner.
- Allows the LLM to access older, contextually relevant interactions without the constraint of a fixed window size.