You can use Gemma 3 and later models to analyze and understand the content of images. This capability includes tasks like describing image content, identifying objects, recognizing scenes, and even inferring the context from visual information.
While Gemma release 3 and later models demonstrate strong general visual prompting capabilities, certain tasks require more specialized tools. For precise object detection or image segmentation, models like PaliGemma or Convolutional Neural Networks (CNNs) often provide superior accuracy and performance.
Here are some examples demonstrating these capabilities.
Image Captioning
Input: A funny image of a cat
Text Input: "caption this image"
This prompt would produce the following response: |
---|
Here are a few captions for the image of the surprised black and white cat: Funny Options: * "Did someone say...treats?" * "Wait, what was that noise?!" * "My face when I realize it's Monday." * "I swear I didn't do that!" Cute Options: * "Caught mid-surprise!" * "So much curiosity!" * "A little startled, but adorable." Simple & Descriptive: * "Shocked!" * "Wide-eyed surprise." --- To help me give you an even better caption, could you tell me: * What kind of tone are you going for (funny, cute, etc.)? |
Visual Question Answering
Input: A funny image of a cat with the sign
Text Input: "What's the sign says?"
This prompt would produce the following response: |
---|
Here's what the sign says in Japanese: 猫に注意 (Neko ni chūi) - "Caution for cats" 何かします (Nanika shimasu) - "Something will happen" So, the overall message is a playful warning to be careful around the cat! |
Image Comparison
Input: Two images, one of a fantasy game landscape during the day, and one at
night
Text Input: "Compare and contrast the two images."
This prompt would produce the following response: |
---|
Okay, let's compare and contrast the two images of the landscape. Similarities: * Overall Scene: Both images depict a very similar landscape – a winding river flowing through a lush, green valley bordered by mountains and a dense forest. The composition is nearly identical, with the same placement of trees, rocks, and the river's path. * Style: Both images have a similar illustrative style, leaning towards a digital painting with a slightly stylized and idealized look. They both have a soft, painterly quality with blended colors and a focus on creating a serene and beautiful atmosphere. * Elements: The core elements are the same: the river, the mountains, the forest, the grassy meadow, and the scattered rocks. Differences: * Time of Day/Lighting: This is the most significant difference. * Top Image: It's a daytime scene, bathed in bright sunlight. The sky is blue with fluffy white clouds, and the light is warm and vibrant. * Bottom Image: It's a nighttime scene. The sky is dark with a full moon and scattered stars. The lighting is much softer and cooler, with a subtle glow along the river. * Atmosphere: * Top Image: The atmosphere is bright, cheerful, and inviting. It feels like a pleasant summer day. * Bottom Image: The atmosphere is more mysterious, tranquil, and perhaps a little magical. The moonlight and stars create a sense of wonder and quietness. * Details: * Top Image: There are more visible details like wildflowers and brighter colors in the grass. * Bottom Image: The details are slightly muted, with a focus on the overall mood and the subtle glow of the river. There are a few flying birds in the sky, adding to the nighttime feel. In essence: The two images are essentially the same scene, but transformed by the change in lighting and time of day. One is a vibrant daytime landscape, while the other is a serene and magical nighttime scene. --- Would you like me to analyze any specific aspect of the images in more detail, such as the color palette, composition, or the mood they evoke? |
Object Identification
Input: An image of a kitchen with various utensils
Text Input: "List all the objects in the image and their colors."
This prompt would produce the following response: |
---|
Okay, let's break down the objects in the image and their colors: Kitchen Elements: * Countertops: Green (light and slightly muted) * Cabinets: Light Green (similar to the countertops) * Hood Vent: Silver/Gray * Light Fixture: Orange/Terracotta * Wall Tiles: White * Sink: White Cookware & Utensils: * Large Pots: Light Blue, Pink * Small Bowls: White, Light Blue * Jars: Yellow, Light Blue * Whisk: Brown * Spoons/Ladles: Brown, Wood-toned * Cutting Board: Light Brown * Spice Rack: Beige/Tan Other Items: * Potted Plants: Green (foliage), Brown (pots) * Tin Containers: Pink, Beige/Tan * Bucket: Gray Note: The colors are somewhat muted and watercolor-like, so there's a slight variation in shades. Do you want me to describe any specific object in more detail, or perhaps focus on a particular area of the image? |
Scene Atmosphere
Input: An image of a sunset over the ocean
Text Input: "Describe the atmosphere of the scene."
This prompt would produce the following response: |
---|
Here's a description of the atmosphere in the image: Overall Impression: The scene evokes a feeling of profound peace, tranquility, and breathtaking beauty. It's a moment of serene stillness and wonder. Color & Light: * Warm Hues: The dominant colors are warm – oranges, yellows, and pinks – reflecting the setting sun. These colors create a sense of warmth and comfort. * Gradient Sky: The sky is a stunning gradient, transitioning from deep blues and purples at the top to vibrant oranges and pinks near the horizon. This creates a dramatic and captivating visual effect. * Golden Reflection: The sun's rays are brilliantly reflected on the water, creating a shimmering path that draws the eye into the scene. Atmospheric Conditions: * Cloudy Skies: There are scattered clouds, adding texture and depth to the sky. They appear fluffy and voluminous, catching the sunlight and creating a beautiful interplay of light and shadow. * Calm Water: The sea is remarkably calm, with gentle ripples reflecting the colors of the sky. This stillness contributes to the overall sense of peace. Mood & Feeling: * Serene: The scene is incredibly serene and calming. It invites a feeling of relaxation and contemplation. * Magical: The sunset's colors and the reflection on the water create a magical, almost dreamlike atmosphere. * Hopeful: Sunsets often symbolize the end of a day and the promise of a new beginning, lending a sense of hopefulness to the scene. Would you like me to focus on a specific aspect of the atmosphere, such as the light, the colors, or the feeling it evokes? |