What is a Webistes node?

Node in Stack AI

A Websites node is a knowledge base that you can use to store information from URLs. The URLs that you upload are then indexed and stored in a vector data base - don’t worry, we do this for you in the background! This process is done once, so that you can later query this knowledge base and only retrieve the pieces of information that are more related to your query. It’s the most efficient way to manage a long list of URLs, without having to index them everytime you run the workflow (i.e., embeddings are generated only once, when you upload the URLs).

Status options for Websites node

You will see a label for each document that you upload with the following meaning:

  • Loading: the document is being processed and indexed.
  • Ok: the document was successfully indexed.
  • Error: the document could not be indexed (e.g., due to a formatting issue).

How to use it?

How to connect with other nodes

To utilize the Websites node, you must establish connections to both its input and output edges:

  • Input: This node necessitates a text input. Typically, you would connect the input to an LLM or Input node.
  • Output: This node outputs chunks of information. Typically, you would connect the output to an LLM node.

By clicking on the settings icon, you can access the settings modal. Here you can do the following:

  • Output format: options are chunks, pages, and docs.
  • Top Results: number of search results ranked by relevance.
  • Max Characters: number of characters sent to the LLM.
  • Answer Multiple Questions: enable to get the answer from multiple questions in parallel.
  • Search Algorithm: by default, the system uses semantic. You can also choose keyword or hybrid.
  • Advanced Q&A: handle questions to compare or summarize documents.
  • Rerank: enable to get more precise information retrieval.
  • Transfrom Query: enable to get more precise information retrieval.
  • Model for Embeddings: as default, the text-embedding-3-large model from OpenAI is selected. However, you will have the option to select the following ones: azure-text-embedding-ada-002, bert-base-cased, all-mpnset-base, palm2 and more.
  • Chunking algorithm: by default, the system uses sentence. You can also choose naive.
  • Chunk overlap: by default, the system uses 500. You can also choose as many as you want up to 4500 by clicking the number and editing it.
  • Chunk length: by default, the system uses 2500. You can also choose as many as you want up to 4500 by clicking the number and editing it.
  • Advanced Data Extraction: for complex data like tables, images, charts.
  • Text in imgs (OCR): by default, this option is deselected. Enable it if you want to extract text from imgs that are present in your documents.
  • Embeddings API key: by default, the text field is empty. Stack AI’s API key are used. If you would like to use yours, then include your API key in this text field.

Upload URLs via API

There is also the option to add URLs to an URLs+Search node using an endpoint. Below, you will find the steps to do so.

https://stack-intext.onrender.com/index_urls_api

This endpoint requires the following variables:

  1. flow_id: the id displayed on the url of your flow.
  2. node_id: the id of the node where to upload the url (e.g. urlemb-0).
  3. org: your organization name.

Additionally, the request needs to be signed with your api key. Example of usage:

import requests

# API endpoint URL
url = f"https://stack-intext.onrender.com/index_urls_api?flow_id={'YOUR-FLOW-ID'}&node_id={'NODE-ID'}&org={'YOUR-ORGANIZATION'}"

# Request data
headers = {
    "Authorization": "Bearer PRIVATE-API-KEY",
}

# Make the API request
url_list = ['https://www.mixpanel.com', 'https://github.com/',]

response = requests.post(url, json={'urls': url_list}, headers=headers)

# Check the response
if response.status_code == 200:
    print("API request successful")
else:
    print("API request failed:", response.text)