Multimodal models
Text to Image
Create images from text descriptions
Text to Image
Several image-to-text models are integrated into Stack AI’s platform. Find below a list of the models, a brief description and links for further information.
LATEST MODEL | DESCRIPTION | LINK |
---|---|---|
text2prompt | Provides approximate text prompts that can be used with stable diffusion to recreate similar looking versions of the image/painting. | More Info |
instructblip-vicuna | This model generates text that is conditioned on both text and image prompts. Unlike standard multi-modal models, it has also been fine-tuned to follow human instructions. | More Info |
mini-gpt4 | Vision encoder with a pretrained ViT and Q-Former, a single linear projection layer, and an advanced Vicuna large language model. | More Info |