本指南涵蓋 Live API 的功能和設定,請參閱「開始使用 Live API」頁面,瞭解常見用途的概略說明和程式碼範例。
事前準備
- 熟悉核心概念:如果您尚未熟悉這些概念,請先參閱「開始使用 Live API 」頁面。本課程將介紹 Live API 的基本原則、運作方式,以及不同模型與其對應的音訊產生方法 (原生音訊或半階層) 之間的差異。
- 在 AI Studio 中試用 Live API:建議您在開始建構前,先在 Google AI Studio 中試用 Live API。如要在 Google AI Studio 中使用 Live API,請選取「串流」。
建立連線
以下範例說明如何使用 API 金鑰建立連線:
Python
import asyncio
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
print("Session started")
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function main() {
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
console.debug(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
// Send content...
session.close();
}
main();
互動模式
以下各節將提供 Live API 中可用的不同輸入和輸出模式的範例和相關背景資訊。
收發文字
以下說明如何傳送及接收文字訊息:
Python
import asyncio
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
message = "Hello, how are you?"
await session.send_client_content(
turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
)
async for response in session.receive():
if response.text is not None:
print(response.text, end="")
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
const inputTurns = 'Hello how are you?';
session.sendClientContent({ turns: inputTurns });
const turns = await handleTurn();
for (const turn of turns) {
if (turn.text) {
console.debug('Received text: %s\n', turn.text);
}
else if (turn.data) {
console.debug('Received inline data: %s\n', turn.data);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
逐步更新內容
使用遞增更新功能傳送文字輸入內容、建立工作階段背景資訊或還原工作階段背景資訊。如果是短時間的內容,您可以傳送逐向互動,以代表確切的事件順序:
Python
turns = [
{"role": "user", "parts": [{"text": "What is the capital of France?"}]},
{"role": "model", "parts": [{"text": "Paris"}]},
]
await session.send_client_content(turns=turns, turn_complete=False)
turns = [{"role": "user", "parts": [{"text": "What is the capital of Germany?"}]}]
await session.send_client_content(turns=turns, turn_complete=True)
JavaScript
let inputTurns = [
{ "role": "user", "parts": [{ "text": "What is the capital of France?" }] },
{ "role": "model", "parts": [{ "text": "Paris" }] },
]
session.sendClientContent({ turns: inputTurns, turnComplete: false })
inputTurns = [{ "role": "user", "parts": [{ "text": "What is the capital of Germany?" }] }]
session.sendClientContent({ turns: inputTurns, turnComplete: true })
對於較長的背景資訊,建議您提供單一訊息摘要,以便釋出脈絡窗口,供後續互動使用。如要瞭解其他載入工作階段內容的方法,請參閱「繼續工作階段」。
傳送及接收音訊
入門指南會介紹最常見的音訊範例:音訊轉音訊。
以下是音訊轉文字的範例,可讀取 WAV 檔案、以正確格式傳送,並接收文字輸出內容:
Python
# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile
import asyncio
import io
from pathlib import Path
from google import genai
from google.genai import types
import soundfile as sf
import librosa
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
buffer = io.BytesIO()
y, sr = librosa.load("sample.wav", sr=16000)
sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
buffer.seek(0)
audio_bytes = buffer.read()
# If already in correct format, you can use this:
# audio_bytes = Path("sample.pcm").read_bytes()
await session.send_realtime_input(
audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
)
async for response in session.receive():
if response.text is not None:
print(response.text)
if __name__ == "__main__":
asyncio.run(main())
JavaScript
// Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
// Install helpers for converting files: npm install wavefile
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
// Send Audio Chunk
const fileBuffer = fs.readFileSync("sample.wav");
// Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
const wav = new WaveFile();
wav.fromBuffer(fileBuffer);
wav.toSampleRate(16000);
wav.toBitDepth("16");
const base64Audio = wav.toBase64();
// If already in correct format, you can use this:
// const fileBuffer = fs.readFileSync("sample.pcm");
// const base64Audio = Buffer.from(fileBuffer).toString('base64');
session.sendRealtimeInput(
{
audio: {
data: base64Audio,
mimeType: "audio/pcm;rate=16000"
}
}
);
const turns = await handleTurn();
for (const turn of turns) {
if (turn.text) {
console.debug('Received text: %s\n', turn.text);
}
else if (turn.data) {
console.debug('Received inline data: %s\n', turn.data);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
以下是文字轉語音的範例。您可以將 AUDIO
設為回應模式,接收音訊。這個範例會將收到的資料儲存為 WAV 檔案:
Python
import asyncio
import wave
from google import genai
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["AUDIO"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
wf = wave.open("audio.wav", "wb")
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(24000)
message = "Hello how are you?"
await session.send_client_content(
turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
)
async for response in session.receive():
if response.data is not None:
wf.writeframes(response.data)
# Un-comment this code to print audio data info
# if response.server_content.model_turn is not None:
# print(response.server_content.model_turn.parts[0].inline_data.mime_type)
wf.close()
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.AUDIO] };
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
const inputTurns = 'Hello how are you?';
session.sendClientContent({ turns: inputTurns });
const turns = await handleTurn();
// Combine audio data strings and save as wave file
const combinedAudio = turns.reduce((acc, turn) => {
if (turn.data) {
const buffer = Buffer.from(turn.data, 'base64');
const intArray = new Int16Array(buffer.buffer, buffer.byteOffset, buffer.byteLength / Int16Array.BYTES_PER_ELEMENT);
return acc.concat(Array.from(intArray));
}
return acc;
}, []);
const audioBuffer = new Int16Array(combinedAudio);
const wf = new WaveFile();
wf.fromScratch(1, 24000, '16', audioBuffer);
fs.writeFileSync('output.wav', wf.toBuffer());
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
音訊格式
Live API 中的音訊資料一律為原始、小端序的 16 位元 PCM。音訊輸出一律會使用 24 kHz 的取樣率。輸入音訊的預設取樣率為 16kHz,但 Live API 會視需要重新取樣,因此可傳送任何取樣率。如要傳達輸入音訊的取樣率,請將每個含有音訊的 Blob 的 MIME 類型設為 audio/pcm;rate=16000
之類的值。
音訊轉錄
您可以在設定設定檔中傳送 output_audio_transcription
,啟用模型的音訊輸出轉錄功能。系統會根據模型的回覆推斷語音轉錄語言。
Python
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["AUDIO"],
"output_audio_transcription": {}
}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
message = "Hello? Gemini are you there?"
await session.send_client_content(
turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
)
async for response in session.receive():
if response.server_content.model_turn:
print("Model turn:", response.server_content.model_turn)
if response.server_content.output_transcription:
print("Transcript:", response.server_content.output_transcription.text)
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = {
responseModalities: [Modality.AUDIO],
outputAudioTranscription: {}
};
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
const inputTurns = 'Hello how are you?';
session.sendClientContent({ turns: inputTurns });
const turns = await handleTurn();
for (const turn of turns) {
if (turn.serverContent && turn.serverContent.outputTranscription) {
console.debug('Received output transcription: %s\n', turn.serverContent.outputTranscription.text);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
您可以在設定設定檔中傳送 input_audio_transcription
,啟用音訊輸入的轉錄功能。
Python
import asyncio
from pathlib import Path
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {
"response_modalities": ["TEXT"],
"input_audio_transcription": {},
}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
audio_data = Path("16000.pcm").read_bytes()
await session.send_realtime_input(
audio=types.Blob(data=audio_data, mime_type='audio/pcm;rate=16000')
)
async for msg in session.receive():
if msg.server_content.input_transcription:
print('Transcript:', msg.server_content.input_transcription.text)
if __name__ == "__main__":
asyncio.run(main())
JavaScript
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = {
responseModalities: [Modality.TEXT],
inputAudioTranscription: {}
};
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
// Send Audio Chunk
const fileBuffer = fs.readFileSync("16000.wav");
// Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
const wav = new WaveFile();
wav.fromBuffer(fileBuffer);
wav.toSampleRate(16000);
wav.toBitDepth("16");
const base64Audio = wav.toBase64();
// If already in correct format, you can use this:
// const fileBuffer = fs.readFileSync("sample.pcm");
// const base64Audio = Buffer.from(fileBuffer).toString('base64');
session.sendRealtimeInput(
{
audio: {
data: base64Audio,
mimeType: "audio/pcm;rate=16000"
}
}
);
const turns = await handleTurn();
for (const turn of turns) {
if (turn.serverContent && turn.serverContent.outputTranscription) {
console.log("Transcription")
console.log(turn.serverContent.outputTranscription.text);
}
}
for (const turn of turns) {
if (turn.text) {
console.debug('Received text: %s\n', turn.text);
}
else if (turn.data) {
console.debug('Received inline data: %s\n', turn.data);
}
else if (turn.serverContent && turn.serverContent.inputTranscription) {
console.debug('Received input transcription: %s\n', turn.serverContent.inputTranscription.text);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
串流音訊和視訊
變更語音和語言
Live API 模型各自支援不同的語音集合。半級聯網支援 Puck、Charon、Kore、Fenrir、Aoede、Leda、Orus 和 Zephyr。原生音訊支援的清單更長 (與 TTS 模型清單相同)。您可以在 AI Studio 中聆聽所有語音。
如要指定語音,請在 speechConfig
物件中設定語音名稱,做為工作階段設定的一部分:
Python
config = {
"response_modalities": ["AUDIO"],
"speech_config": {
"voice_config": {"prebuilt_voice_config": {"voice_name": "Kore"}}
},
}
JavaScript
const config = {
responseModalities: [Modality.AUDIO],
speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "Kore" } } }
};
generateContent
Live API 支援多種語言。
如要變更語言,請在 speechConfig
物件中設定語言代碼,做為工作階段設定的一部分:
Python
config = {
"response_modalities": ["AUDIO"],
"speech_config": {
"language_code": "de-DE"
}
}
JavaScript
const config = {
responseModalities: [Modality.AUDIO],
speechConfig: { languageCode: "de-DE" }
};
原生音訊功能
下列功能僅適用於原生音訊。如要進一步瞭解原生音訊,請參閱「選擇模型和音訊產生方式」一文。
如何使用原生音訊輸出
如要使用原生音訊輸出,請設定其中一個原生音訊模型,並將 response_modalities
設為 AUDIO
。
如需完整範例,請參閱「收發語音」。
Python
model = "gemini-2.5-flash-preview-native-audio-dialog"
config = types.LiveConnectConfig(response_modalities=["AUDIO"])
async with client.aio.live.connect(model=model, config=config) as session:
# Send audio input and receive audio
JavaScript
const model = 'gemini-2.5-flash-preview-native-audio-dialog';
const config = { responseModalities: [Modality.AUDIO] };
async function main() {
const session = await ai.live.connect({
model: model,
config: config,
callbacks: ...,
});
// Send audio input and receive audio
session.close();
}
main();
情緒感知對話
這項功能可讓 Gemini 根據輸入內容的措辭和語氣調整回覆風格。
如要使用情感對話,請將 API 版本設為 v1alpha
,並在設定訊息中將 enable_affective_dialog
設為 true
:
Python
client = genai.Client(http_options={"api_version": "v1alpha"})
config = types.LiveConnectConfig(
response_modalities=["AUDIO"],
enable_affective_dialog=True
)
JavaScript
const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });
const config = {
responseModalities: [Modality.AUDIO],
enableAffectiveDialog: true
};
請注意,情緒對話目前僅支援原生音訊輸出模型。
主動音訊
啟用這項功能後,Gemini 可主動決定不回應不相關的內容。
如要使用此功能,請將 API 版本設為 v1alpha
,並在設定訊息中設定 proactivity
欄位,然後將 proactive_audio
設為 true
:
Python
client = genai.Client(http_options={"api_version": "v1alpha"})
config = types.LiveConnectConfig(
response_modalities=["AUDIO"],
proactivity={'proactive_audio': True}
)
JavaScript
const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });
const config = {
responseModalities: [Modality.AUDIO],
proactivity: { proactiveAudio: true }
}
請注意,目前只有原生音訊輸出模型支援主動式音訊。
原生音訊輸出,含思考時間
原生音訊輸出功能支援思考功能,可透過單獨的模型 gemini-2.5-flash-exp-native-audio-thinking-dialog
使用。
如需完整範例,請參閱「收發語音」。
Python
model = "gemini-2.5-flash-exp-native-audio-thinking-dialog"
config = types.LiveConnectConfig(response_modalities=["AUDIO"])
async with client.aio.live.connect(model=model, config=config) as session:
# Send audio input and receive audio
JavaScript
const model = 'gemini-2.5-flash-exp-native-audio-thinking-dialog';
const config = { responseModalities: [Modality.AUDIO] };
async function main() {
const session = await ai.live.connect({
model: model,
config: config,
callbacks: ...,
});
// Send audio input and receive audio
session.close();
}
main();
語音活動偵測 (VAD)
語音活動偵測 (VAD) 可讓模型辨識使用者說話的時間。這對於建立自然對話至關重要,因為這樣一來,使用者就能隨時中斷模型。
VAD 偵測到中斷時,系統會取消並捨棄目前的生成作業。只有已傳送至用戶端的資訊會保留在工作階段記錄中。接著,伺服器會傳送 BidiGenerateContentServerContent
訊息來回報中斷情形。
接著,Gemini 伺服器會捨棄任何待處理的函式呼叫,並傳送 BidiGenerateContentServerContent
訊息,其中包含已取消呼叫的 ID。
Python
async for response in session.receive():
if response.server_content.interrupted is True:
# The generation was interrupted
# If realtime playback is implemented in your application,
# you should stop playing audio and clear queued playback here.
JavaScript
const turns = await handleTurn();
for (const turn of turns) {
if (turn.serverContent && turn.serverContent.interrupted) {
// The generation was interrupted
// If realtime playback is implemented in your application,
// you should stop playing audio and clear queued playback here.
}
}
自動語音偵測
根據預設,模型會自動對連續音訊輸入串流執行 VAD。您可以使用設定配置的 realtimeInputConfig.automaticActivityDetection
欄位設定 VAD。
如果音訊串流暫停超過一秒 (例如因為使用者關閉麥克風),就應傳送 audioStreamEnd
事件,以便清除任何快取音訊。用戶端隨時可以恢復傳送音訊資料。
Python
# example audio file to try:
# URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
# !wget -q $URL -O sample.pcm
import asyncio
from pathlib import Path
from google import genai
from google.genai import types
client = genai.Client()
model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}
async def main():
async with client.aio.live.connect(model=model, config=config) as session:
audio_bytes = Path("sample.pcm").read_bytes()
await session.send_realtime_input(
audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
)
# if stream gets paused, send:
# await session.send_realtime_input(audio_stream_end=True)
async for response in session.receive():
if response.text is not None:
print(response.text)
if __name__ == "__main__":
asyncio.run(main())
JavaScript
// example audio file to try:
// URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
// !wget -q $URL -O sample.pcm
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };
async function live() {
const responseQueue = [];
async function waitMessage() {
let done = false;
let message = undefined;
while (!done) {
message = responseQueue.shift();
if (message) {
done = true;
} else {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return message;
}
async function handleTurn() {
const turns = [];
let done = false;
while (!done) {
const message = await waitMessage();
turns.push(message);
if (message.serverContent && message.serverContent.turnComplete) {
done = true;
}
}
return turns;
}
const session = await ai.live.connect({
model: model,
callbacks: {
onopen: function () {
console.debug('Opened');
},
onmessage: function (message) {
responseQueue.push(message);
},
onerror: function (e) {
console.debug('Error:', e.message);
},
onclose: function (e) {
console.debug('Close:', e.reason);
},
},
config: config,
});
// Send Audio Chunk
const fileBuffer = fs.readFileSync("sample.pcm");
const base64Audio = Buffer.from(fileBuffer).toString('base64');
session.sendRealtimeInput(
{
audio: {
data: base64Audio,
mimeType: "audio/pcm;rate=16000"
}
}
);
// if stream gets paused, send:
// session.sendRealtimeInput({ audioStreamEnd: true })
const turns = await handleTurn();
for (const turn of turns) {
if (turn.text) {
console.debug('Received text: %s\n', turn.text);
}
else if (turn.data) {
console.debug('Received inline data: %s\n', turn.data);
}
}
session.close();
}
async function main() {
await live().catch((e) => console.error('got error', e));
}
main();
有了 send_realtime_input
,API 就會根據 VAD 自動回應音訊。send_client_content
會依序將訊息新增至模型內容,而 send_realtime_input
則會犧牲確定性排序,以提升回應速度。
自動 VAD 設定
如要進一步控制 VAD 活動,您可以設定下列參數。詳情請參閱 API 參考資料。
Python
from google.genai import types
config = {
"response_modalities": ["TEXT"],
"realtime_input_config": {
"automatic_activity_detection": {
"disabled": False, # default
"start_of_speech_sensitivity": types.StartSensitivity.START_SENSITIVITY_LOW,
"end_of_speech_sensitivity": types.EndSensitivity.END_SENSITIVITY_LOW,
"prefix_padding_ms": 20,
"silence_duration_ms": 100,
}
}
}
JavaScript
import { GoogleGenAI, Modality, StartSensitivity, EndSensitivity } from '@google/genai';
const config = {
responseModalities: [Modality.TEXT],
realtimeInputConfig: {
automaticActivityDetection: {
disabled: false, // default
startOfSpeechSensitivity: StartSensitivity.START_SENSITIVITY_LOW,
endOfSpeechSensitivity: EndSensitivity.END_SENSITIVITY_LOW,
prefixPaddingMs: 20,
silenceDurationMs: 100,
}
}
};
停用自動 VAD
或者,您也可以在設定訊息中將 realtimeInputConfig.automaticActivityDetection.disabled
設為 true
,停用自動語音偵測功能。在這個設定中,用戶端負責偵測使用者的語音,並在適當時間傳送 activityStart
和 activityEnd
訊息。在這個設定中不會傳送 audioStreamEnd
。而是會在任何中斷的串流上標示 activityEnd
訊息。
Python
config = {
"response_modalities": ["TEXT"],
"realtime_input_config": {"automatic_activity_detection": {"disabled": True}},
}
async with client.aio.live.connect(model=model, config=config) as session:
# ...
await session.send_realtime_input(activity_start=types.ActivityStart())
await session.send_realtime_input(
audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
)
await session.send_realtime_input(activity_end=types.ActivityEnd())
# ...
JavaScript
const config = {
responseModalities: [Modality.TEXT],
realtimeInputConfig: {
automaticActivityDetection: {
disabled: true,
}
}
};
session.sendRealtimeInput({ activityStart: {} })
session.sendRealtimeInput(
{
audio: {
data: base64Audio,
mimeType: "audio/pcm;rate=16000"
}
}
);
session.sendRealtimeInput({ activityEnd: {} })
符記數
您可以在傳回的伺服器訊息的 usageMetadata 欄位中,找到已使用的符記總數。
Python
async for message in session.receive():
# The server will periodically send messages that include UsageMetadata.
if message.usage_metadata:
usage = message.usage_metadata
print(
f"Used {usage.total_token_count} tokens in total. Response token breakdown:"
)
for detail in usage.response_tokens_details:
match detail:
case types.ModalityTokenCount(modality=modality, token_count=count):
print(f"{modality}: {count}")
JavaScript
const turns = await handleTurn();
for (const turn of turns) {
if (turn.usageMetadata) {
console.debug('Used %s tokens in total. Response token breakdown:\n', turn.usageMetadata.totalTokenCount);
for (const detail of turn.usageMetadata.responseTokensDetails) {
console.debug('%s\n', detail);
}
}
}
媒體解析度
您可以將 mediaResolution
欄位設為工作階段設定的一部分,藉此指定輸入媒體的媒體解析度:
Python
from google.genai import types
config = {
"response_modalities": ["AUDIO"],
"media_resolution": types.MediaResolution.MEDIA_RESOLUTION_LOW,
}
JavaScript
import { GoogleGenAI, Modality, MediaResolution } from '@google/genai';
const config = {
responseModalities: [Modality.TEXT],
mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW,
};
限制
規劃專案時,請考量 Live API 的下列限制。
回應模式
您只能在工作階段設定中為每個工作階段設定一個回應模式 (TEXT
或 AUDIO
)。設定這兩項元素會導致設定錯誤訊息。也就是說,您可以設定模型以文字或音訊回應,但在同一個工作階段中,無法同時使用這兩種回應方式。
用戶端驗證
根據預設,Live API 只提供伺服器對伺服器驗證。如果您是使用用戶端對伺服器方法實作 Live API 應用程式,就必須使用暫時性權杖來降低安全性風險。
工作階段持續時間
僅音訊的會議時長上限為 15 分鐘,而音訊加視訊的會議時長上限為 2 分鐘。不過,您可以設定不同的工作階段管理技巧,讓工作階段時間無限延長。
脈絡窗口
工作階段的脈絡窗口限制為:
- 原生音訊輸出模型的 128k 符記
- 其他 Live API 型號的 32k 符記
支援的語言
Live API 支援下列語言。
語言 | BCP-47 代碼 | 語言 | BCP-47 代碼 |
---|---|---|---|
德文 (德國) | de-DE |
英文 (澳洲)* | en-AU |
英文 (英國)* | en-GB |
英文 (印度) | en-IN |
英文 (美國) | en-US |
西班牙文 (美國) | es-US |
法文 (法國) | fr-FR |
北印度文 (印度) | hi-IN |
葡萄牙文 (巴西) | pt-BR |
阿拉伯文 (一般) | ar-XA |
西班牙文 (西班牙)* | es-ES |
法文 (加拿大)* | fr-CA |
印尼文 (印尼) | id-ID |
義大利文 (義大利) | it-IT |
日文 (日本) | ja-JP |
土耳其文 (土耳其) | tr-TR |
越南文 (越南) | vi-VN |
孟加拉文 (印度) | bn-IN |
古吉拉特文 (印度)* | gu-IN |
卡納達文 (印度)* | kn-IN |
馬拉地文 (印度) | mr-IN |
馬拉雅拉姆文 (印度)* | ml-IN |
泰米爾文 (印度) | ta-IN |
泰盧固文 (印度) | te-IN |
荷蘭文 (荷蘭) | nl-NL |
韓文 (韓國) | ko-KR |
中文 (中國)* | cmn-CN |
波蘭文 (波蘭) | pl-PL |
俄文 (俄羅斯) | ru-RU |
泰文 (泰國) | th-TH |
標有星號 (*) 的語言不適用於原生音訊。
後續步驟
- 請參閱「使用工具」和「工作階段管理」指南,瞭解如何有效使用 Live API。
- 請在 Google AI Studio 中試用 Live API。
- 如要進一步瞭解 Live API 模型,請參閱「模型」頁面中的 Gemini 2.0 Flash Live 和 Gemini 2.5 Flash 原生音訊。
- 請參閱 Live API 食譜、Live API 工具食譜和 Live API 新手指南指令碼,瞭解更多範例。