URLs + Search
Upload urls and search through them
What is a URLs + Search node?
Node in Stack AI
A URLs + Search node is a knowledge base that you can use to store information from URLs. The URLs that you upload are then indexed and stored in a vector data base - don’t worry, we do this for you in the background! This process is done once, so that you can later query this knowledge base and only retrieve the pieces of information that are more related to your query. It’s the most efficient way to manage a long list of URLs, without having to index them everytime you run the workflow (i.e., embeddings are generated only once, when you upload the URLs).
Status options for URLs + Search node
You will see a label for each document that you upload with the following meaning:
- Loading: the document is being processed and indexed.
- Ok: the document was successfully indexed.
- Error: the document could not be indexed (e.g., due to a formatting issue).
How to use it?
How to connect with other nodes
To utilize the URLs + Search node, you must establish connections to both its input and output edges:
- Input: This node necessitates a text input. Typically, you would connect the input to an LLM or Input node.
- Output: This node outputs chunks of information. Typically, you would connect the output to an LLM or Output node.
By clicking on the settings icon, you can access the settings modal. Here you can do the following:
- Select the vector database: right now, only Weaviate is available.
- Select the model for embeddings: as default, the
text-embedding-ada-002 model
from OpenAI is selected. However, you will have the option to select the following ones:azure-text-embedding-ada-002
,bert-base-cased
,all-mpnset-base
,palm2
- Chunking algorithm: by default, the system uses
sentence
. You can also choosenaive
. - Retrieval algorithm: by default, the system uses
chunks
. You can also choosedocs
orpages
. - Top K: by default, the system uses
10
. You can also choose as many as you want by clicking the number and editing it. - Chunk overlap: by default, the system uses
500
. You can also choose as many as you want by clicking the number and editing it. - Chunk length: by default, the system uses
1500
. You can also choose as many as you want by clicking the number and editing it. - Result length: by default, the system uses
5000
. You can also choose as many as you want by clicking the number and editing it. - Post-retrieval rerank: by default, the system uses
5000
. You can also choose as many as you want by clicking the number and editing it. - Unstructured.io: by default, this option is deselected. Enable it to extract unstructured data from documents.
- Text in imgs (OCR): by default, this option is deselected. Enable it if you want to extract text from imgs that are present in your documents.
- Embeddings API key: by default, the text field is empty. Stack AI’s API key are used. If you would like to use yours, then include your API key in this text field.
Example of usage
A practical application of URLs + Search node involves integrating it with an AI assistant designed to answer user queries based on context data stored in the URLs + Search node.
Answering questions from users using URLs + Search data
To construct this workflow, employ an LLM to address the users’ inquiries. Incorporate an Input node to funnel the users’ submissions, and link an URLs + Search node. Upload as many URLs as you want to the node, and wait until they are indexed (i.e., an OK appears at every URL).
System prompt (LLM)
Upload URLs via API
There is also the option to add URLs to an URLs+Search node using an endpoint. Below, you will find the steps to do so.
This endpoint requires the following variables:
flow_id
: the id displayed on the url of your flow.node_id
: the id of the node where to upload the url (e.g.urlemb-0
).org
: your organization name.
Additionally, the request needs to be signed with your api key. Example of usage: