PaliGemma Image Captioning

Gradio Demo for the PaliGemma 2 Vision Language Understanding and Generation model. This model generates natural language captions based on uploaded images. To use it, upload your image, select the desired parameters (or stick with the default settings), and click 'Submit.' You can also choose one of the examples to load a predefined image. For more information, please refer to the links below.

Examples

Image	Max Tokens	Language

PaliGemma 2: A Family of Versatile VLMs for Transfer | Model Page