Text to Image

Several image-to-text models are integrated into Stack AI’s platform. Find below a list of the models, a brief description and links for further information.

LATEST MODELDESCRIPTIONLINK
text2promptProvides approximate text prompts that can be used with stable diffusion to recreate similar looking versions of the image/painting.More Info
instructblip-vicunaThis model generates text that is conditioned on both text and image prompts. Unlike standard multi-modal models, it has also been fine-tuned to follow human instructions.More Info
mini-gpt4Vision encoder with a pretrained ViT and Q-Former, a single linear projection layer, and an advanced Vicuna large language model.More Info