Text to Image

Several image-to-text models are integrated into Stack AI’s platform. Find below a list of the models, a brief description and links for further information.

LATEST MODEL	DESCRIPTION	LINK
text2prompt	Provides approximate text prompts that can be used with stable diffusion to recreate similar looking versions of the image/painting.	More Info
instructblip-vicuna	This model generates text that is conditioned on both text and image prompts. Unlike standard multi-modal models, it has also been fine-tuned to follow human instructions.	More Info
mini-gpt4	Vision encoder with a pretrained ViT and Q-Former, a single linear projection layer, and an advanced Vicuna large language model.	More Info

Text to Audio Documents

On this page

Text to Image

Get Started

Builder Guide

Deployer Guide

Settings

Technical Considerations

Text to Image

Text to Image

Get Started

Builder Guide

Deployer Guide

Settings

Technical Considerations

​Text to Image

Text to Image