AI Fundamentals

AI assistants for your enterprise data warehouse

May 10, 2024

As data continues to grow in scale and complexity, accessing and analyzing it efficiently has become a significant challenge for many organizations. Data warehouses like Snowflake, BigQuery, and Redshift are powerful tools, but they often require specialized technical expertise to extract valuable insights.

Recent advancements in Large Language Models (LLMs) and Text-to-SQL translation offer a promising solution. By integrating these technologies with data warehouses, we can create more intuitive and accessible interfaces for data analysis.

In this post, we'll explore how to build an AI-powered analytics tool using Stack AI and Snowflake. We'll walk through the process of setting up the environment, integrating with Snowflake, and deploying an AI assistant that can translate natural language queries into SQL and provide human-readable results.

Designing a solution in Stack AI

The key requirements for a solution are as follows:

Input: receiving a user question on the data warehouse in natural language.
Text-2-SQL: analyzing the user query, along with the database schema, to generate a SQL query that answers the user request.
Data Warehouse: executing the SQL query on the database, retrieving relevant data.
Reasoning: reviewing the data warehouse result and extracting insights in natural langauge.

These requirements must be bundled into a user interface that can be privately shared with the business analytics team. These requirements translate into the following components:

Text-2-SQL: the ability to translate natural language requests into SQL queries.
Data Warehouse: a system to host and manages the data, allowing the execution of SQL queries at a large scale.
Large Language Model: the core of the system is an AI model that can aggregate information from the Data Warehouse in order to synthesize responses to user requests.

Combining these components results in a reliable system to synthesize data-driven insights.

Using Stack AI

The key requirements to implement this system are readily available as modules and settings inside of the Stack AI platform. More specifically, we need to leverage:

Inputs: a text input with the user request.
Text-to-SQL: a node that receives the user input and translates it to a SQL query. The node is configured with the database schema and SQL language.
Data Loader: a node to send a SQL query to the data warehouse and return the response.
Large Language Model: An AI that can combine the context from the user request and the data warehouse responses. Since this language model must receive perform a heavy reasoning task, we will require a model with a large context window. In this case, we select llama-3-70b from Meta, since it offers top-tier reasoning capabilities and it can be self-hosted on our infrastructure. We configure the language model under:
- Instructions: Three sentences describing the task. One sentence describing the different data sources involved.

- Prompt: A simple message with the incoming data and the user request to fulfill

Output: The response of the language model streamed.

Putting it together

We implement this workflow in Stack AI using the instructions above, resulting in the following project:

Once we complete setting up the credentials of our data warehouse and adding the schema to our text-to-sql node, we can save the project and publish it! Then, we proceed to export it.

We recommend exporting this application as a Chat Interface

In the export section, we will:

Select Chat Interface.
Go to the 'General' tab and give a name to our assistant.

Then, access the resulting URL to interact with our assistant as shown below:

Conclusion: Empowering Decision-Making with AI

Congratulations! You've now integrated an AI assistant into your enterprise data warehouse. By simplifying the process of querying data through natural language, you can democratize access to insights, enabling decision-makers at all levels to ask complex questions and receive answers in real-time.

There are many implications for this technology are several: reduces the dependency on data analysts for generating reports, speeds up the decision-making process, and it ensures that your strategies are backed by the most accurate and up-to-date information available.

As you continue to refine and expand your AI assistant's capabilities, remember that the ultimate goal is to create a culture where data is not just accessible but integral to every decision made.

Advanced: Including Custom Business Knowledge

To get the most out of your AI assistant you will need to integrate further context on your business, financials, and the stucture of the data warehouse. A few resources that would drastically improve the performance of this AI assistant include:

Data Warehouse Manual: a document outlining the structure of the data inside Snowflake and examples of common queries used to answer questions. This document can be used to augment the user request with additional context.
Business reports: an outline of the business performance, it's growth, and strategy. This document can be used to give the LLM a reference for the business strategy and goals.
Financials: a set of documents that describe the financial performance of the business (e.g. 10k reports). This document can be used to give the LLM a reference on the performance of the business.

To integrate these documents we will include three additional nodes into our workflow:

Document Knowledge Base 1: a knowledge base hosting the data warehouse manual and potentially a list of example queries. This knowledge base should be queried based on the user request.
Python node: in order to augment the user question, we will combine the output of the knowledge base with the user input. In order to achieve this, we will add a python node that will compose a new user request combining both elements.
Document Knowledge Base 2: a knowledge base hosting company financials and reports as context. This knowledge base should be queried based on the user request and its response will be sent as context to the LLM.

Including these three nodes and adjusting the LLM prompt to aggreate the business knowledge results in the pipeline below: