The Gemini API provides access to Veo 2, Google's state-of-the-art video generation model. Veo is designed to help you build next-generation AI applications that transform user prompts and images into high quality video assets.
This guide will help you get started with Veo using the Gemini API.
About Veo
Veo is Google's most capable video generation model to date. It generates videos in a wide range of cinematic and visual styles, capturing prompt nuance to render intricate details consistently across frames.
To learn more and see example output, check out the Google DeepMind Veo overview.
Specifications
Modalities |
|
Request latency |
|
Variable length generation | 5-8 seconds |
Resolution | 720p |
Frame rate | 24fps |
Aspect ratio |
|
Input languages (text-to-video) | English |
Videos created by Veo are watermarked using SynthID, our tool for watermarking and identifying AI-generated content, and are passed through safety filters and memorization checking processes that help mitigate privacy, copyright and bias risks.
Before you begin
Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use.
To use Veo with the Google Gen AI SDKs, ensure that you have one of the following versions installed:
- Python v1.10.0 or later
- TypeScript and JavaScript v0.8.0 or later
- Go v1.0.0 or later
Generate videos
This section provides code examples for generating videos using text prompts and using images.
Generate from text
You can use the following code to generate videos with Veo:
Python
import time
from google import genai
from google.genai import types
client = genai.Client() # read API key from GOOGLE_API_KEY
operation = client.models.generate_videos(
model="veo-2.0-generate-001",
prompt="Panning wide shot of a calico kitten sleeping in the sunshine",
config=types.GenerateVideosConfig(
person_generation="dont_allow", # "dont_allow" or "allow_adult"
aspect_ratio="16:9", # "16:9" or "9:16"
),
)
while not operation.done:
time.sleep(20)
operation = client.operations.get(operation)
for n, generated_video in enumerate(operation.response.generated_videos):
client.files.download(file=generated_video.video)
generated_video.video.save(f"video{n}.mp4") # save the video
REST
# Use curl to send a POST request to the predictLongRunning endpoint
# The request body includes the prompt for video generation
curl "${BASE_URL}/models/veo-2.0-generate-001:predictLongRunning?key=${GOOGLE_API_KEY}" \
-H "Content-Type: application/json" \
-X "POST" \
-d '{
"instances": [{
"prompt": "Panning wide shot of a calico kitten sleeping in the sunshine"
}
],
"parameters": {
"aspectRatio": "16:9",
"personGeneration": "dont_allow",
}
}' | tee result.json | jq .name | sed 's/"//g' > op_name
This code takes about 2-3 minutes to run, though it may take longer if resources are constrained. Once it's done running, you should see a video that looks something like this:
If you see an error message instead of a video, this means that resources are constrained and your request couldn't be completed. In this case, run the code again.
Generated videos are stored on the server for 2 days, after which they are
removed. If you want to save a local copy of your generated video, you must run
result()
and save()
within 2 days of generation.
Generate from images
You can also generate videos using images. The following code generates an image using Imagen, then uses the generated image as the starting frame for the generated video.
First, generate an image using Imagen:
Python
prompt="Panning wide shot of a calico kitten sleeping in the sunshine",
imagen = client.models.generate_images(
model="imagen-3.0-generate-002",
prompt=prompt,
config=types.GenerateImagesConfig(
aspect_ratio="16:9",
number_of_images=1
)
)
images[0]
Then, generate a video using the resulting image as the first frame:
Python
operation = client.models.generate_videos(
model="veo-2.0-generate-001",
prompt=prompt,
image = images[0],
config=types.GenerateVideosConfig(
# person_generation only accepts "dont_allow" for image-to-video
aspect_ratio="16:9", # "16:9" or "9:16"
number_of_videos=2
),
)
# Wait for videos to generate
while not operation.done:
time.sleep(20)
operation = client.operations.get(operation)
for n, video in enumerate(operation.response.generated_videos):
fname = f'with_image_input{n}.mp4'
print(fname)
video.save(fname)
Veo model parameters
(Naming conventions vary by programming language.)
prompt
: The text prompt for the video. When present, theimage
parameter is optional.image
: The image to use as the first frame for the video. When present, theprompt
parameter is optional.negativePrompt
: Text string that describes anything you want to discourage the model from generatingaspectRatio
: Changes the aspect ratio of the generated video. Supported values are"16:9"
and"9:16"
. The default is"16:9"
.personGeneration
: Allow the model to generate videos of people. The following values are supported:- Text-to-video generation:
"dont_allow"
: Don't allow the inclusion of people or faces."allow_adult"
: Generate videos that include adults, but not children.
- Image-to-video generation:
"dont_allow"
: Default and only value for image-to-video generation.
- Text-to-video generation:
numberOfVideos
: Output videos requested, either1
or2
.durationSeconds
: Length of each output video in seconds, between5
and8
.enhance_prompt
: Enable or disable the prompt rewriter. Enabled by default.
Things to try
To get the most out of Veo, incorporate video-specific terminology into your prompts. Veo understands a wide range of terms related to:
- Shot composition: Specify the framing and number of subjects in the shot (e.g., "single shot," "two shot," "over-the-shoulder shot").
- Camera positioning and movement: Control the camera's location and movement using terms like "eye level," "high angle," "worms eye," "dolly shot," "zoom shot," "pan shot," and "tracking shot."
- Focus and lens effects: Use terms like "shallow focus," "deep focus," "soft focus," "macro lens," and "wide-angle lens" to achieve specific visual effects.
- Overall style and subject: Guide Veo's creative direction by specifying styles like "sci-fi," "romantic comedy," "action movie," or "animation." You can also describe the subjects and backgrounds you want, such as "cityscape," "nature," "vehicles," or "animals."
Veo prompt guide
This section of the Veo guide contains examples of videos you can create using Veo, and shows you how to modify prompts to produce distinct results.
Safety filters
Veo applies safety filters across Gemini to help ensure that generated videos and uploaded photos don't contain offensive content. Prompts that violate our terms and guidelines are blocked.
Prompt writing basics
Good prompts are descriptive and clear. To get your generated video as close as possible to what you want, start with identifying your core idea, and then refine your idea by adding keywords and modifiers.
The following elements should be included in your prompt:
- Subject: The object, person, animal, or scenery that you want in your video.
- Context: The background or context in which the subject is placed.
- Action: What the subject is doing (for example, walking, running, or turning their head).
- Style: This can be general or very specific. Consider using specific film style keywords, such as horror film, film noir, or animated styles like cartoon style.
- Camera motion: [Optional] What the camera is doing, such as aerial view, eye-level, top-down shot, or low-angle shot.
- Composition: [Optional] How the shot is framed, such as wide shot, close-up, or extreme close-up.
- Ambiance: [Optional] How the color and light contribute to the scene, such as blue tones, night, or warm tones.
More tips for writing prompts
The following tips help you write prompts that generate your videos:
- Use descriptive language: Use adjectives and adverbs to paint a clear picture for Veo.
- Provide context: If necessary, include background information to help your model understand what you want.
- Reference specific artistic styles: If you have a particular aesthetic in mind, reference specific artistic styles or art movements.
- Utilize prompt engineering tools: Consider exploring prompt engineering tools or resources to help you refine your prompts and achieve optimal results. For more information, visit Introduction to prompt design.
- Enhance the facial details in your personal and group images: Specify facial details as a focus of the photo like using the word portrait in the prompt.
Example prompts and output
This section presents several prompts, highlighting how descriptive details can elevate the outcome of each video.
Icicles
This video demonstrates how you can use the elements of prompt writing basics in your prompt.
Prompt | Generated output |
---|---|
Close up shot (composition) of melting icicles (subject) on a frozen rock wall (context) with cool blue tones (ambiance), zoomed in (camera motion) maintaining close-up detail of water drips (action). |
![]() |
Man on the phone
These videos demonstrate how you can revise your prompt with increasingly specific details to get Veo to refine the output to your liking.
Prompt | Generated output | Analysis |
---|---|---|
The camera dollies to show a close up of a desperate man in a green trench coat. He's making a call on a rotary-style wall phone with a green neon light. It looks like a movie scene. |
![]() |
This is the first generated video based on the prompt. |
A close-up cinematic shot follows a desperate man in a weathered green trench coat as he dials a rotary phone mounted on a gritty brick wall, bathed in the eerie glow of a green neon sign. The camera dollies in, revealing the tension in his jaw and the desperation etched on his face as he struggles to make the call. The shallow depth of field focuses on his furrowed brow and the black rotary phone, blurring the background into a sea of neon colors and indistinct shadows, creating a sense of urgency and isolation. |
![]() |
A more detailed prompt results in a video that is more focused with a richer environment. |
A video with smooth motion that dollies in on a desperate man in a green trench coat, using a vintage rotary phone against a wall bathed in an eerie green neon glow. The camera starts from a medium distance, slowly moving closer to the man's face, revealing his frantic expression and the sweat on his brow as he urgently dials the phone. The focus is on the man's hands, his fingers fumbling with the dial as he desperately tries to connect. The green neon light casts long shadows on the wall, adding to the tense atmosphere. The scene is framed to emphasize the isolation and desperation of the man, highlighting the stark contrast between the vibrant glow of the neon and the man's grim determination. |
![]() |
Adding more detail gives the subject a realistic expression and creates an intense and vibrant scene. |
Snow leopard
This example demonstrates the output Veo might generate for a simple prompt.
Prompt | Generated output |
---|---|
A cute creature with snow leopard-like fur is walking in winter forest, 3D cartoon style render. |
![]() |
Running snow leopard
This prompt has more detail and demonstrates generated output that might be closer to what you want in your video.
Prompt | Generated output |
---|---|
Create a short 3D animated scene in a joyful cartoon style. A cute creature with snow leopard-like fur, large expressive eyes, and a friendly, rounded form happily prances through a whimsical winter forest. The scene should feature rounded, snow-covered trees, gentle falling snowflakes, and warm sunlight filtering through the branches. The creature's bouncy movements and wide smile should convey pure delight. Aim for an upbeat, heartwarming tone with bright, cheerful colors and playful animation. |
![]() |
Examples by writing elements
These examples show you how to refine your prompts by each basic element.
Subject
This example shows you how to specify a subject description.
Subject description | Prompt | Generated output |
---|---|---|
The description can include a subject, or multiple subjects and actions. Here, our subject is "white concrete apartment building." | An architectural rendering of a white concrete apartment building with flowing organic shapes, seamlessly blending with lush greenery and futuristic elements |
![]() |
Context
This example shows you how to specify context.
Context | Prompt | Generated output |
---|---|---|
The background or context in which the subject will be placed is very important. Try placing your subject in a variety of backgrounds like on a busy street, or in outer space. | A satellite floating through outer space with the moon and some stars in the background. |
![]() |
Action
This example shows you how to specify action.
Action | Prompt | Generated output |
---|---|---|
What is the subject doing like walking, running, or turning their head. | A wide shot of a woman walking along the beach, looking content and relaxed towards the horizon at sunset. |
![]() |
Style
This example shows you how to specify style.
Style | Prompt | Generated output |
---|---|---|
You can add keywords to improve generation quality and steer it closer to intended style, such as shallow depth of field, movie still, minimalistic, surreal, vintage, futuristic, or double-exposure. | Film noir style, man and woman walk on the street, mystery, cinematic, black and white. |
![]() |
Camera motion
This example shows you how to specify camera motion.
Camera motion | Prompt | Generated output |
---|---|---|
Options for camera motion include POV shot, aerial view, tracking drone view, or tracking shot. | A POV shot from a vintage car driving in the rain, Canada at night, cinematic. |
![]() |
Composition
This example shows you how to specify composition.
Composition | Prompt | Generated output |
---|---|---|
How the shot is framed (wide shot, close-up, low angle). | Extreme close-up of a an eye with city reflected in it. |
![]() |
Create a video of a wide shot of surfer walking on a beach with a surfboard, beautiful sunset, cinematic. |
![]() |
Ambiance
This example shows you how to specify ambiance.
Ambiance | Prompt | Generated output |
---|---|---|
Color palettes play a vital role in photography, influencing the mood and conveying intended emotions. Try things like "muted orange warm tones," "natural light," "sunrise" or "sunset". For example, a warm, golden palette can infuse a romantic and atmospheric feel into a photograph. | A close-up of a girl holding adorable golden retriever puppy in the park, sunlight. |
![]() |
Cinematic close-up shot of a sad woman riding a bus in the rain, cool blue tones, sad mood. |
![]() |
Use reference images to generate videos
You can bring images to life by using Veo's image-to-video capability. You can use existing assets, or try Imagen to generate something new.
Prompt | Generated output |
---|---|
Bunny with a chocolate candy bar. |
![]() |
Bunny runs away. |
![]() |
Negative prompts
Negative prompts can be a powerful tool to help specify elements you don't want in the video. Describe what you want to discourage the model from generating after the phrase "Negative prompt". Follow these tips:
❌ Don't use instructive language or words like no or don't. For example, "No walls" or "don't show walls".
✅ Do describe what you don't want to see. For example, "wall, frame", which means that you don't want a wall or a frame in the video.
Prompt | Generated output |
---|---|
Generate a short, stylized animation of a large, solitary oak tree with leaves blowing vigorously in a strong wind. The tree should have a slightly exaggerated, whimsical form, with dynamic, flowing branches. The leaves should display a variety of autumn colors, swirling and dancing in the wind. The animation should use a warm, inviting color palette. |
![]() |
Generate a short, stylized animation of a large, solitary oak tree
with leaves blowing vigorously in a strong wind. The tree should have a slightly
exaggerated, whimsical form, with dynamic, flowing branches. The leaves should
display a variety of autumn colors, swirling and dancing in the wind. The
animation should use a warm, inviting color palette.
With negative prompt - urban background, man-made structures, dark, stormy, or threatening atmosphere. |
![]() |
Aspect ratios
Gemini Veo video generation supports the following two aspect ratios:
Aspect ratio | Description |
---|---|
Widescreen or 16:9 | The most common aspect ratio for televisions, monitors, and mobile phone screens (landscape). Use this when you want to capture more of the background, like in scenic landscapes. |
Portrait or 9:16 | Rotated widescreen. This aspect ratio has been popularized by short form video applications, such as Youtube shorts. Use this for portraits or tall objects with strong vertical orientations, such as
buildings, trees, waterfall, or buildings. |
Widescreen
This prompt is an example of the widescreen aspect ratio of 16:9.
Prompt | Generated output |
---|---|
Create a video with a tracking drone view of a man driving a red convertible car in Palm Springs, 1970s, warm sunlight, long shadows. |
![]() |
Portrait
This prompt is an example of the portrait aspect ratio of 9:16.
Prompt | Generated output |
---|---|
Create a video highlighting the smooth motion of a majestic Hawaiian waterfall within a lush rainforest. Focus on realistic water flow, detailed foliage, and natural lighting to convey tranquility. Capture the rushing water, misty atmosphere, and dappled sunlight filtering through the dense canopy. Use smooth, cinematic camera movements to showcase the waterfall and its surroundings. Aim for a peaceful, realistic tone, transporting the viewer to the serene beauty of the Hawaiian rainforest. |
![]() |
What's next
- Gain more experience generating AI videos with the Veo Colab.