What is a URLs + Search node?

Node in Stack AI

A URLs + Search node is a knowledge base that you can use to store information from URLs. The URLs that you upload are then indexed and stored in a vector data base - don’t worry, we do this for you in the background! This process is done once, so that you can later query this knowledge base and only retrieve the pieces of information that are more related to your query. It’s the most efficient way to manage a long list of URLs, without having to index them everytime you run the workflow (i.e., embeddings are generated only once, when you upload the URLs).

Status options for URLs + Search node

You will see a label for each document that you upload with the following meaning:

  • Loading: the document is being processed and indexed.
  • Ok: the document was successfully indexed.
  • Error: the document could not be indexed (e.g., due to a formatting issue).

How to use it?

How to connect with other nodes

To utilize the URLs + Search node, you must establish connections to both its input and output edges:

  • Input: This node necessitates a text input. Typically, you would connect the input to an LLM or Input node.
  • Output: This node outputs chunks of information. Typically, you would connect the output to an LLM or Output node.

By clicking on the settings icon, you can access the settings modal. Here you can do the following:

  • Select the vector database: right now, only Weaviate is available.
  • Select the model for embeddings: as default, the text-embedding-ada-002 model from OpenAI is selected. However, you will have the option to select the following ones: azure-text-embedding-ada-002, bert-base-cased, all-mpnset-base, palm2
  • Chunking algorithm: by default, the system uses sentence. You can also choose naive.
  • Retrieval algorithm: by default, the system uses chunks. You can also choose docs or pages.
  • Top K: by default, the system uses 10. You can also choose as many as you want by clicking the number and editing it.
  • Chunk overlap: by default, the system uses 500. You can also choose as many as you want by clicking the number and editing it.
  • Chunk length: by default, the system uses 1500. You can also choose as many as you want by clicking the number and editing it.
  • Result length: by default, the system uses 5000. You can also choose as many as you want by clicking the number and editing it.
  • Post-retrieval rerank: by default, the system uses 5000. You can also choose as many as you want by clicking the number and editing it.
  • Unstructured.io: by default, this option is deselected. Enable it to extract unstructured data from documents.
  • Text in imgs (OCR): by default, this option is deselected. Enable it if you want to extract text from imgs that are present in your documents.
  • Embeddings API key: by default, the text field is empty. Stack AI’s API key are used. If you would like to use yours, then include your API key in this text field.

Example of usage

A practical application of URLs + Search node involves integrating it with an AI assistant designed to answer user queries based on context data stored in the URLs + Search node.

Answering questions from users using URLs + Search data

To construct this workflow, employ an LLM to address the users’ inquiries. Incorporate an Input node to funnel the users’ submissions, and link an URLs + Search node. Upload as many URLs as you want to the node, and wait until they are indexed (i.e., an OK appears at every URL).

System prompt (LLM)

You are the customer support assistant for a company called Stack AI, a tool to build LLM (Large Language Model) applications. You are given some extra context on stack AI taken from a few websites. You speak in a friendly and professional tone and you are brief in your answers. You try not to be repetitive.

If you do not know the answer to a question or need to refer the customer to someone else share one of the following resources:

1) support email: support@stack-ai.com
2) discord: https://discord.gg/sSbwawtNsV
3) Calendly link: https://calendly.com/baceituno
4) Documents: http://docs.stack-ai.com/

Some context  to answer questions:

{urlemb-0}

My message: {in-0}
Your response:

Upload URLs via API

There is also the option to add URLs to an URLs+Search node using an endpoint. Below, you will find the steps to do so.

https://stack-intext.onrender.com/index_urls_api

This endpoint requires the following variables:

  1. flow_id: the id displayed on the url of your flow.
  2. node_id: the id of the node where to upload the url (e.g. urlemb-0).
  3. org: your organization name.

Additionally, the request needs to be signed with your api key. Example of usage:

import requests

# API endpoint URL
url = f"https://stack-intext.onrender.com/index_urls_api?flow_id={'YOUR-FLOW-ID'}&node_id={'NODE-ID'}&org={'YOUR-ORGANIZATION'}"

# Request data
headers = {
    "Authorization": "Bearer PRIVATE-API-KEY",
}

# Make the API request
url_list = ['https://www.mixpanel.com', 'https://github.com/',]

response = requests.post(url, json={'urls': url_list}, headers=headers)

# Check the response
if response.status_code == 200:
    print("API request successful")
else:
    print("API request failed:", response.text)