本指南全面介绍了 Live API 提供的功能和配置。如需查看常见用例的概览和示例代码,请参阅 Live API 使用入门页面。
准备工作
- 熟悉核心概念:如果您尚未熟悉,请先阅读 Live API 使用入门 页面。本课将向您介绍 Live API 的基本原理、运作方式,以及不同模型及其对应的音频生成方法(原生音频或半级联)之间的区别。
- 在 AI Studio 中试用实时 API:在开始构建之前,不妨先在 Google AI Studio 中试用实时 API。如需在 Google AI Studio 中使用 Live API,请选择 Stream。
建立连接
以下示例展示了如何使用 API 密钥创建连接:
Python
import asyncio
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
print("Session started")
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function main() {
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
console.debug(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
// Send content...
session.close();
}
main();
互动模式
以下部分提供了 Live API 中提供的不同输入和输出模式的示例和支持背景信息。
发送和接收短信
您可以通过以下方式发送和接收短信:
Python
import asyncio
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
message = "Hello, how are you?"
await session.send_client_content(
turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
)
async for response in session.receive():
if response.text is not None:
print(response.text, end="")
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
const inputTurns = 'Hello how are you?';
session.sendClientContent({ turns: inputTurns });
const turns = await handleTurn();
for (const turn of turns) {
if (turn.text) {
console.debug('Received text: %s\n', turn.text);
}
else if (turn.data) {
console.debug('Received inline data: %s\n', turn.data);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
增量内容更新
使用增量更新发送文本输入、建立会话上下文或恢复会话上下文。对于简短的情境,您可以发送精细互动来表示确切的事件顺序:
Python
turns = [
{"role": "user", "parts": [{"text": "What is the capital of France?"}]},
{"role": "model", "parts": [{"text": "Paris"}]},
]
await session.send_client_content(turns=turns, turn_complete=False)
turns = [{"role": "user", "parts": [{"text": "What is the capital of Germany?"}]}]
await session.send_client_content(turns=turns, turn_complete=True)
JavaScript
let inputTurns = [
{ "role": "user", "parts": [{ "text": "What is the capital of France?" }] },
{ "role": "model", "parts": [{ "text": "Paris" }] },
]
session.sendClientContent({ turns: inputTurns, turnComplete: false })
inputTurns = [{ "role": "user", "parts": [{ "text": "What is the capital of Germany?" }] }]
session.sendClientContent({ turns: inputTurns, turnComplete: true })
对于较长的上下文,建议提供单个消息摘要,以便释放上下文窗口以供后续互动。如需了解加载会话上下文的另一种方法,请参阅会话恢复。
发送和接收音频
开始使用指南介绍了最常见的音频示例:音频到音频。
以下是读取 WAV 文件、以正确格式发送该文件并接收文本输出的语音转文字示例:
Python
# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile
import asyncio
import io
from pathlib import Path
from google import genai
from google.genai import types
import soundfile as sf
import librosa
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
buffer = io.BytesIO()
y, sr = librosa.load("sample.wav", sr=16000)
sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
buffer.seek(0)
audio_bytes = buffer.read()
# If already in correct format, you can use this:
# audio_bytes = Path("sample.pcm").read_bytes()
await session.send_realtime_input(
audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
)
async for response in session.receive():
if response.text is not None:
print(response.text)
if __name__ == "__main__":
asyncio.run(main())
JavaScript
// Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
// Install helpers for converting files: npm install wavefile
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
// Send Audio Chunk
const fileBuffer = fs.readFileSync("sample.wav");
// Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
const wav = new WaveFile();
wav.fromBuffer(fileBuffer);
wav.toSampleRate(16000);
wav.toBitDepth("16");
const base64Audio = wav.toBase64();
// If already in correct format, you can use this:
// const fileBuffer = fs.readFileSync("sample.pcm");
// const base64Audio = Buffer.from(fileBuffer).toString('base64');
session.sendRealtimeInput(
{
audio: {
data: base64Audio,
mimeType: "audio/pcm;rate=16000"
}
}
);
const turns = await handleTurn();
for (const turn of turns) {
if (turn.text) {
console.debug('Received text: %s\n', turn.text);
}
else if (turn.data) {
console.debug('Received inline data: %s\n', turn.data);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
以下是文本转换为音频的示例。您可以通过将 AUDIO
设置为响应模式来接收音频。以下示例将收到的数据保存为 WAV 文件:
Python
import asyncio
import wave
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["AUDIO"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
wf = wave.open("audio.wav", "wb")
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(24000)
message = "Hello how are you?"
await session.send_client_content(
turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
)
async for response in session.receive():
if response.data is not None:
wf.writeframes(response.data)
# Un-comment this code to print audio data info
# if response.server_content.model_turn is not None:
# print(response.server_content.model_turn.parts[0].inline_data.mime_type)
wf.close()
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.AUDIO] };
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
const inputTurns = 'Hello how are you?';
session.sendClientContent({ turns: inputTurns });
const turns = await handleTurn();
// Combine audio data strings and save as wave file
const combinedAudio = turns.reduce((acc, turn) => {
if (turn.data) {
const buffer = Buffer.from(turn.data, 'base64');
const intArray = new Int16Array(buffer.buffer, buffer.byteOffset, buffer.byteLength / Int16Array.BYTES_PER_ELEMENT);
return acc.concat(Array.from(intArray));
}
return acc;
}, []);
const audioBuffer = new Int16Array(combinedAudio);
const wf = new WaveFile();
wf.fromScratch(1, 24000, '16', audioBuffer);
fs.writeFileSync('output.wav', wf.toBuffer());
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
音频格式
Live API 中的音频数据始终是原始的、小端序的 16 位 PCM。音频输出始终使用 24kHz 的采样率。输入音频的原始采样率为 16kHz,但 Live API 会根据需要重新采样,以便发送任何采样率。如需传达输入音频的采样率,请将每个包含音频的 Blob 的 MIME 类型设置为 audio/pcm;rate=16000
等值。
音频转录
您可以在设置配置中发送 output_audio_transcription
,以启用模型音频输出的转写功能。转写语言是根据模型的回答推断出来的。
Python
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["AUDIO"],
"output_audio_transcription": {}
}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
message = "Hello? Gemini are you there?"
await session.send_client_content(
turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
)
async for response in session.receive():
if response.server_content.model_turn:
print("Model turn:", response.server_content.model_turn)
if response.server_content.output_transcription:
print("Transcript:", response.server_content.output_transcription.text)
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = {
responseModalities: [Modality.AUDIO],
outputAudioTranscription: {}
};
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
const inputTurns = 'Hello how are you?';
session.sendClientContent({ turns: inputTurns });
const turns = await handleTurn();
for (const turn of turns) {
if (turn.serverContent && turn.serverContent.outputTranscription) {
console.debug('Received output transcription: %s\n', turn.serverContent.outputTranscription.text);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
您可以在设置配置中发送 input_audio_transcription
,以启用音频输入的转写功能。
Python
import asyncio
from pathlib import Path
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {
"response_modalities": ["TEXT"],
"input_audio_transcription": {},
}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
audio_data = Path("16000.pcm").read_bytes()
await session.send_realtime_input(
audio=types.Blob(data=audio_data, mime_type='audio/pcm;rate=16000')
)
async for msg in session.receive():
if msg.server_content.input_transcription:
print('Transcript:', msg.server_content.input_transcription.text)
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = {
responseModalities: [Modality.TEXT],
inputAudioTranscription: {}
};
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
// Send Audio Chunk
const fileBuffer = fs.readFileSync("16000.wav");
// Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
const wav = new WaveFile();
wav.fromBuffer(fileBuffer);
wav.toSampleRate(16000);
wav.toBitDepth("16");
const base64Audio = wav.toBase64();
// If already in correct format, you can use this:
// const fileBuffer = fs.readFileSync("sample.pcm");
// const base64Audio = Buffer.from(fileBuffer).toString('base64');
session.sendRealtimeInput(
{
audio: {
data: base64Audio,
mimeType: "audio/pcm;rate=16000"
}
}
);
const turns = await handleTurn();
for (const turn of turns) {
if (turn.serverContent && turn.serverContent.outputTranscription) {
console.log("Transcription")
console.log(turn.serverContent.outputTranscription.text);
}
}
for (const turn of turns) {
if (turn.text) {
console.debug('Received text: %s\n', turn.text);
}
else if (turn.data) {
console.debug('Received inline data: %s\n', turn.data);
}
else if (turn.serverContent && turn.serverContent.inputTranscription) {
console.debug('Received input transcription: %s\n', turn.serverContent.inputTranscription.text);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
流式传输音频和视频
更改语音和语言
Live API 模型各自支持一组不同的声音。半级联接支持 Puck、Charon、Kore、Fenrir、Aoede、Leda、Orus 和 Zephyr。原生音频支持更长的列表(与 TTS 模型列表相同)。您可以在 AI Studio 中收听所有语音。
如需指定语音,请在会话配置中在 speechConfig
对象中设置语音名称:
Python
config = {
"response_modalities": ["AUDIO"],
"speech_config": {
"voice_config": {"prebuilt_voice_config": {"voice_name": "Kore"}}
},
}
JavaScript
const config = {
responseModalities: [Modality.AUDIO],
speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "Kore" } } }
};
Live API 支持多种语言。
如需更改语言,请在会话配置中设置 speechConfig
对象中的语言代码:
Python
config = {
"response_modalities": ["AUDIO"],
"speech_config": {
"language_code": "de-DE"
}
}
JavaScript
const config = {
responseModalities: [Modality.AUDIO],
speechConfig: { languageCode: "de-DE" }
};
原生音频功能
以下功能仅适用于原生音频。如需详细了解原生音频,请参阅选择模型和音频生成方式。
如何使用原生音频输出
如需使用原生音频输出,请配置其中一个原生音频模型,并将 response_modalities
设置为 AUDIO
。
如需查看完整示例,请参阅发送和接收音频。
Python
model = "gemini-2.5-flash-preview-native-audio-dialog"
config = types.LiveConnectConfig(response_modalities=["AUDIO"])
async with client.aio.live.connect(model=model, config=config) as session:
# Send audio input and receive audio
JavaScript
const model = 'gemini-2.5-flash-preview-native-audio-dialog';
const config = { responseModalities: [Modality.AUDIO] };
async function main() {
const session = await ai.live.connect({
model: model,
config: config,
callbacks: ...,
});
// Send audio input and receive audio
session.close();
}
main();
共情对话
借助此功能,Gemini 可以根据输入内容的情绪表达和语气调整回答风格。
如需使用情感对话框,请在设置消息中将 API 版本设置为 v1alpha
并将 enable_affective_dialog
设置为 true
:
Python
client = genai.Client(http_options={"api_version": "v1alpha"})
config = types.LiveConnectConfig(
response_modalities=["AUDIO"],
enable_affective_dialog=True
)
JavaScript
const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });
const config = {
responseModalities: [Modality.AUDIO],
enableAffectiveDialog: true
};
请注意,目前只有原生音频输出模型支持情感对话。
主动音频
启用此功能后,如果内容不相关,Gemini 可以主动决定不回复。
如需使用此 API,请将 API 版本设置为 v1alpha
,并在设置消息中配置 proactivity
字段,然后将 proactive_audio
设置为 true
:
Python
client = genai.Client(http_options={"api_version": "v1alpha"})
config = types.LiveConnectConfig(
response_modalities=["AUDIO"],
proactivity={'proactive_audio': True}
)
JavaScript
const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });
const config = {
responseModalities: [Modality.AUDIO],
proactivity: { proactiveAudio: true }
}
请注意,目前只有原生音频输出模型支持主动音频。
带有思考的原生音频输出
原生音频输出支持思考功能,可通过单独的模型 gemini-2.5-flash-exp-native-audio-thinking-dialog
使用。
如需查看完整示例,请参阅发送和接收音频。
Python
model = "gemini-2.5-flash-exp-native-audio-thinking-dialog"
config = types.LiveConnectConfig(response_modalities=["AUDIO"])
async with client.aio.live.connect(model=model, config=config) as session:
# Send audio input and receive audio
JavaScript
const model = 'gemini-2.5-flash-exp-native-audio-thinking-dialog';
const config = { responseModalities: [Modality.AUDIO] };
async function main() {
const session = await ai.live.connect({
model: model,
config: config,
callbacks: ...,
});
// Send audio input and receive audio
session.close();
}
main();
语音活动检测 (VAD)
借助语音活动检测 (VAD),模型可以识别用户何时在说话。这对于创建自然对话至关重要,因为它允许用户随时中断模型。
当 VAD 检测到中断时,系统会取消并舍弃正在生成的音频。只有已发送到客户端的信息会保留在会话历史记录中。然后,服务器会发送 BidiGenerateContentServerContent
消息来报告中断。
然后,Gemini 服务器会舍弃所有待处理的函数调用,并发送包含已取消调用的 ID 的 BidiGenerateContentServerContent
消息。
Python
async for response in session.receive():
if response.server_content.interrupted is True:
# The generation was interrupted
# If realtime playback is implemented in your application,
# you should stop playing audio and clear queued playback here.
JavaScript
const turns = await handleTurn();
for (const turn of turns) {
if (turn.serverContent && turn.serverContent.interrupted) {
// The generation was interrupted
// If realtime playback is implemented in your application,
// you should stop playing audio and clear queued playback here.
}
}
自动 VAD
默认情况下,该模型会自动对连续的音频输入流执行 VAD。您可以使用设置配置的 realtimeInputConfig.automaticActivityDetection
字段配置 VAD。
当音频流暂停超过一秒时(例如,由于用户关闭了麦克风),应发送 audioStreamEnd
事件以刷新所有缓存的音频。客户端可以随时恢复发送音频数据。
Python
# example audio file to try:
# URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
# !wget -q $URL -O sample.pcm
import asyncio
from pathlib import Path
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
audio_bytes = Path("sample.pcm").read_bytes()
await session.send_realtime_input(
audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
)
# if stream gets paused, send:
# await session.send_realtime_input(audio_stream_end=True)
async for response in session.receive():
if response.text is not None:
print(response.text)
if __name__ == "__main__":
asyncio.run(main())
JavaScript
// example audio file to try:
// URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
// !wget -q $URL -O sample.pcm
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
// Send Audio Chunk
const fileBuffer = fs.readFileSync("sample.pcm");
const base64Audio = Buffer.from(fileBuffer).toString('base64');
session.sendRealtimeInput(
{
audio: {
data: base64Audio,
mimeType: "audio/pcm;rate=16000"
}
}
);
// if stream gets paused, send:
// session.sendRealtimeInput({ audioStreamEnd: true })
const turns = await handleTurn();
for (const turn of turns) {
if (turn.text) {
console.debug('Received text: %s\n', turn.text);
}
else if (turn.data) {
console.debug('Received inline data: %s\n', turn.data);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
使用 send_realtime_input
时,该 API 会根据 VAD 自动响应音频。虽然 send_client_content
会按顺序将消息添加到模型上下文,但 send_realtime_input
会针对响应速度进行优化,但代价是确定性排序。
自动 VAD 配置
如需更好地控制 VAD activity,您可以配置以下参数。如需了解详情,请参阅 API 参考文档。
Python
from google.genai import types
config = {
"response_modalities": ["TEXT"],
"realtime_input_config": {
"automatic_activity_detection": {
"disabled": False, # default
"start_of_speech_sensitivity": types.StartSensitivity.START_SENSITIVITY_LOW,
"end_of_speech_sensitivity": types.EndSensitivity.END_SENSITIVITY_LOW,
"prefix_padding_ms": 20,
"silence_duration_ms": 100,
}
}
}
JavaScript
import { GoogleGenAI, Modality, StartSensitivity, EndSensitivity } from '@google/genai';
const config = {
responseModalities: [Modality.TEXT],
realtimeInputConfig: {
automaticActivityDetection: {
disabled: false, // default
startOfSpeechSensitivity: StartSensitivity.START_SENSITIVITY_LOW,
endOfSpeechSensitivity: EndSensitivity.END_SENSITIVITY_LOW,
prefixPaddingMs: 20,
silenceDurationMs: 100,
}
}
};
停用自动 VAD
或者,您也可以在设置消息中将 realtimeInputConfig.automaticActivityDetection.disabled
设置为 true
以停用自动 VAD。在此配置中,客户端负责检测用户语音并在适当的时间发送 activityStart
和 activityEnd
消息。在此配置中不会发送 audioStreamEnd
。而是会通过 activityEnd
消息标记任何流中断。
Python
config = {
"response_modalities": ["TEXT"],
"realtime_input_config": {"automatic_activity_detection": {"disabled": True}},
}
async with client.aio.live.connect(model=model, config=config) as session:
# ...
await session.send_realtime_input(activity_start=types.ActivityStart())
await session.send_realtime_input(
audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
)
await session.send_realtime_input(activity_end=types.ActivityEnd())
# ...
JavaScript
const config = {
responseModalities: [Modality.TEXT],
realtimeInputConfig: {
automaticActivityDetection: {
disabled: true,
}
}
};
session.sendRealtimeInput({ activityStart: {} })
session.sendRealtimeInput(
{
audio: {
data: base64Audio,
mimeType: "audio/pcm;rate=16000"
}
}
);
session.sendRealtimeInput({ activityEnd: {} })
token 计数
您可以在返回的服务器消息的 usageMetadata 字段中找到消耗的令牌总数。
Python
async for message in session.receive():
# The server will periodically send messages that include UsageMetadata.
if message.usage_metadata:
usage = message.usage_metadata
print(
f"Used {usage.total_token_count} tokens in total. Response token breakdown:"
)
for detail in usage.response_tokens_details:
match detail:
case types.ModalityTokenCount(modality=modality, token_count=count):
print(f"{modality}: {count}")
JavaScript
const turns = await handleTurn();
for (const turn of turns) {
if (turn.usageMetadata) {
console.debug('Used %s tokens in total. Response token breakdown:\n', turn.usageMetadata.totalTokenCount);
for (const detail of turn.usageMetadata.responseTokensDetails) {
console.debug('%s\n', detail);
}
}
}
媒体分辨率
您可以通过在会话配置中设置 mediaResolution
字段,为输入媒体指定媒体分辨率:
Python
from google.genai import types
config = {
"response_modalities": ["AUDIO"],
"media_resolution": types.MediaResolution.MEDIA_RESOLUTION_LOW,
}
JavaScript
import { GoogleGenAI, Modality, MediaResolution } from '@google/genai';
const config = {
responseModalities: [Modality.TEXT],
mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW,
};
限制
在规划项目时,请考虑 Live API 的以下限制。
回答方式
您只能在会话配置中为每个会话设置一种响应模式(TEXT
或 AUDIO
)。同时设置这两项会导致配置错误消息。这意味着,您可以将模型配置为以文本或音频进行回答,但在同一会话中不能同时使用这两种方式。
客户端身份验证
默认情况下,Live API 仅提供服务器到服务器身份验证。如果您使用客户端到服务器方法实现 Live API 应用,则需要使用短时性令牌来降低安全风险。
会话时长
仅音频的会话时长限制为 15 分钟,音频和视频的会话时长限制为 2 分钟。不过,您可以配置不同的会话管理技术,以无限延长会话时长。
上下文窗口
会话的上下文窗口限制为:
- 128,000 个 token(适用于原生音频输出模型)
- 32,000 个令牌(适用于其他实时 API 模型)
支持的语言
Live API 支持以下语言。
语言 | BCP-47 代码 | 语言 | BCP-47 代码 |
---|---|---|---|
德语(德国) | de-DE |
英语(澳大利亚)* | en-AU |
英语(英国)* | en-GB |
英语(印度) | en-IN |
英语(美国) | en-US |
西班牙语(美国) | es-US |
法语(法国) | fr-FR |
印地语(印度) | hi-IN |
葡萄牙语(巴西) | pt-BR |
阿拉伯语(通用) | ar-XA |
西班牙语(西班牙)* | es-ES |
法语(加拿大)* | fr-CA |
印度尼西亚语(印度尼西亚) | id-ID |
意大利语(意大利) | it-IT |
日语(日本) | ja-JP |
土耳其语(土耳其) | tr-TR |
越南语(越南) | vi-VN |
孟加拉语(印度) | bn-IN |
古吉拉特语(印度)* | gu-IN |
卡纳达语(印度)* | kn-IN |
马拉地语(印度) | mr-IN |
马拉雅拉姆语(印度)* | ml-IN |
泰米尔语(印度) | ta-IN |
泰卢固语(印度) | te-IN |
荷兰语(荷兰) | nl-NL |
韩语(韩国) | ko-KR |
中文普通话(中国)* | cmn-CN |
波兰语(波兰) | pl-PL |
俄语(俄罗斯) | ru-RU |
泰语(泰国) | th-TH |
标有星号 (*) 的语言不适用于原生音频。
后续步骤
- 如需了解有关有效使用 Live API 的基本信息,请参阅工具使用和会话管理指南。
- 在 Google AI Studio 中试用实时 API。
- 如需详细了解 Live API 模型,请参阅“模型”页面上的 Gemini 2.0 Flash Live 和 Gemini 2.5 Flash 原生音频。
- 如需尝试更多示例,请参阅 Live API 实战宝典、Live API 工具实战宝典和 Live API 使用入门脚本。