Available LLMs

Comparing models in different use cases

There is a long list of models available in Stack AI, but before getting too overwhelmed with the description of each model, let’s discuss some of the key use cases and which models are preferred.

USE CASE	DESCRIPTION	PREFERRED MODELS
Formatting a prompt	Sometimes an LLM can improve the prompt given by the user before sending it to another LLM that will perform the task. In this case, a lighter, less costly and faster model is preferred.	`gpt-3.5-turbo, davinci, claude-instant-v1`
Summarizing a large document set or websites	In this case, receiving broader context is crucial for the LLM to understand the overall meaning of the document and to summarize it correctly. A model with larger context window is preferred.	`claude-3-opus`
Performing complex tasks	Imagine an LLM receiving input from the user, data from a list of documents, instructions as to how to behave, and needs to use an external tool to retrieve additional information to finally answer the user. In this case, the more powerful the best.	`gpt-4o`
Performing complex tasks requiring large context	Same as the use case above, but requiring a larger context window to read entire documents. To deploy this use case at scale, it is preferred to use a well-trained, large window type of model that has been around for some time.	`claude-3-opus, gpt-4o`

List of all available LLMs

GPT 4

GPT 4 is a large multimodal model (accepting text inputs and emitting text outputs today, with image inputs coming in the future) that can solve difficult problems with greater accuracy than any of our previous models, thanks to its broader general knowledge and advanced reasoning capabilities.

MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
`gpt-4`	More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat.	8,192 tokens	Up to Sep 2021
`gpt-4-32k`	Same capabilities as the base gpt-4 mode but with 4x the context length.	32,768 tokens	Up to Sep 2021
`gpt-4-turbo-preview`	Better capabilities than gpt-4 and gpt-4-32k. Large context window and quick inferences.	128,000 tokens	Up to April 2023
`gpt-4o`	The latest and most advanced GPT model from OpenAI. Ideal for complex tasks requiring long contexts.	128,000 tokens	Up to Oct 2023

GPT-3.5

GPT-3.5 is a mid-generation upgrade of GPT-3 with fewer parameters. It includes a fine-tuning process that involves reinforcement learning with human feedback, which helps to improve the accuracy of the responses.

MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
`gpt-3.5-turbo`	Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with the latest model iteration 2 weeks after it is released.	4,096 tokens	Up to Sep 2021
`gpt-3.5-turbo-16k`	Same capabilities as the standard gpt-3.5-turbo model but with 4 times the context.	16,384 tokens	Up to Sep 2021

Anthropic

Anthropic’s model called Claude is a is a transformer-based LLM, much like GPT-3, that leverages large-scale machine learning techniques. The model is trained on a diverse range of internet text, giving it the ability to generate text that is coherent, contextually relevant, and remarkably human-like.

Below the comparison between the different versions of Claude.

LATEST MODEL	DESCRIPTION	MAX TOKENS
`claude-3-opus`	Claude 3 has the best perofrmance performance, longer responses. Largest context window available in the market.	200,000 tokens
`claude-3-sonnet`	Faster than OpenAI GPT 4 and almost as good. Largest context window available in the market.	200,000 tokens
`claude-3-haiku`	Lighter, less expensive, and much faster option.	200,000 tokens

Google

Stack AI has early access to Google’s PaLM 2 model, the Large Language Model (LLM) released by Google. It is highly capable in advanced reasoning, coding, and mathematics. It’s also multilingual and supports more than 100 languages. PaLM 2 is a successor to the earlier Pathways Language Model (PaLM) launched in 2022.

The two models available are the following.

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
gemini-1.5-pro	Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks	1 million tokens	Up to Feb 2024
gemini-pro	Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks	30,720 tokens	Up to Feb 2023
text-bison-001	Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks	8,192 tokens	Up to Feb 2023
chat-bison-001	Fine-tuned for multi-turn conversation use cases.	4,096 tokens	Up to Feb 2023

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
Llama-3-70b-chat	Llama-3 is a state-of-the-art large language llm designed for enhanced reasoning, coding, and broad application across multiple languages and tasks.	8,000 tokens	Up to March 2023
LLama-3-8b-chat	A smaller version of llama-3 that allows for faster inference.	8,000 tokens	Up to March 2023
Llama-2-70b-chat	Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks	4,096 tokens	Up to Sep 2022
Llama-2-13b-chat	A smaller version of llama-2 that allows for faster inference.	4,096 tokens	Up to Sep 2022

Mistral

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
mistral-large	Our flagship model that’s ideal for complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents).	32,000 tokens	Up to Dec 2021
mistral-medium	Ideal for intermediate tasks that require moderate reasoning (Data extraction, Summarizing a Document, Writing emails, Writing a Job Description, or Writing Product Descriptions)	32,000 tokens	Up to Dec 2021
mistral-small	Suitable for simple tasks that one can do in bulk (Classification, Customer Support, or Text Generation)	32,000 tokens	Up to Dec 2021
mistral-8x22b-instruct	A 22B sparse Mixture-of-Experts (SMoE). Uses only 39B active parameters out of 141B.	64,000 tokens	Up to Dec 2021
mistral-8x7b-instruct	A 7B sparse Mixture-of-Experts (SMoE). Uses 12.9B active parameters out of 45B total.	32,000 tokens	Up to Dec 2021
mistral-7b-instruct	Mistral’s very first model. A 7B transformer model. Small, yet very powerful for a variety of use cases.	32,000 tokens	Up to Dec 2021

Perplexity

The perplexity node lets you use Perplexity’s RAG fine tuned models. These models are ideal when you need to do in real time web search within your workflow.

MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
`llama-3-sonar-large-32k-online`	Perplexity’s most powerful LLM model. Built on top of llama-2-70b.	32,000 tokens	Online model
`llama-3-sonar-small-32k-online`	A smaller but faster LLM that is built on top of mistral-7b.	32,000 tokens	Online model

TogetherAI

TogetherAI provides a high-performance inference engine that ensures rapid processing speeds for large language models (LLMs). Renowned for its exceptional performance, scalability, seamless integration, and extensive support services, TogetherAI stands out in the industry. With StackAI, you can leverage the robust TogetherAI infrastructure through the TogetherAI node, accessing a diverse range of models from various families, including Mistral, LLaMA, Snowflake, and more.

Groq

Groq leverages custom hardware and infrastructure to deliver high-speed inference for a variety of large language models (LLMs). This allows for efficient processing, making it an ideal choice for applications requiring low latency and high throughput. Among the models available in this node, you will find llama3-70b, llama3-8b, mixtral-8x7b, gemma-7b.

Azure

Microsoft Azure offers the ability to host private clouds with OpenAI models. You can add these models in Stack AI using an “Azure” node. Hosting models in Azure has a few benefits:

Lower and consistent latency: models in Azure are not affected by the traffic of OpenAI and
Higher rate limits: models in Azure offer higher rate limits of up to 240,000 tokens per minute and 1440 requests per minute.
Data privacy and compliance: data sent to Azure is kept under the private cloud and is not sent to OpenAI or any external service. These models are covered under Azure’s Business Associate Agreement (BAA) and are HIPPA compliant.

Enterprise users of Stack AI have access to models hosted in Azure.

Bedrock

Access a wide variety of LLMs from different providers hosted on AWS Bedrock. You may also provide your own API keys and use your own models hosted on your VPC.

Get Started

Builder Guide

Deployer Guide

Settings

Technical Considerations

Comparing models in different use cases

List of all available LLMs

GPT 4

GPT-3.5

Anthropic

Google

Meta

Mistral

Perplexity

TogetherAI

Groq

Azure

Bedrock

Get Started

Builder Guide

Deployer Guide

Settings

Technical Considerations

​Comparing models in different use cases

​List of all available LLMs

​GPT 4

​GPT-3.5

​Anthropic

​Google

​Meta

​Mistral

​Perplexity

​TogetherAI

​Groq

​Azure

​Bedrock

Comparing models in different use cases

List of all available LLMs

GPT 4

GPT-3.5

Anthropic

Google

Meta

Mistral

Perplexity

TogetherAI

Groq

Azure

Bedrock