이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Live API capabilities guide

Live API에서 사용할 수 있는 기능과 구성을 다루는 포괄적인 가이드입니다. 일반적인 사용 사례의 개요와 샘플 코드는 Live API 시작하기 페이지를 참고하세요.

시작하기 전에

핵심 개념 숙지: 아직 읽지 않았다면 먼저 Live API 시작하기 페이지를 읽으세요. 여기에서는 Live API의 기본 원리, 작동 방식, 다양한 모델과 해당 오디오 생성 방법 (네이티브 오디오 또는 하프 캐스케이드)의 차이점을 소개합니다.
AI Studio에서 Live API 사용해 보기: 빌드를 시작하기 전에 Google AI Studio에서 Live API를 사용해 보는 것이 유용할 수 있습니다. Google AI Studio에서 Live API를 사용하려면 스트림을 선택합니다.

연결 설정

다음 예에서는 API 키를 사용하여 연결을 만드는 방법을 보여줍니다.

Python

import asyncio
from google import genai

client = genai.Client()

model = "gemini-live-2.5-flash-preview"
config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        print("Session started")

if __name__ == "__main__":
    asyncio.run(main())

자바스크립트

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };

async function main() {

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        console.debug(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  // Send content...

  session.close();
}

main();

상호작용 모달리티

다음 섹션에서는 Live API에서 사용할 수 있는 다양한 입력 및 출력 모달리티의 예시와 지원 컨텍스트를 제공합니다.

텍스트 보내기 및 받기

문자를 보내고 받는 방법은 다음과 같습니다.

Python

import asyncio
from google import genai

client = genai.Client()
model = "gemini-live-2.5-flash-preview"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello, how are you?"
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.text is not None:
                print(response.text, end="")

if __name__ == "__main__":
    asyncio.run(main())

자바스크립트

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  const inputTurns = 'Hello how are you?';
  session.sendClientContent({ turns: inputTurns });

  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
  }

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

증분 콘텐츠 업데이트

증분 업데이트를 사용하여 텍스트 입력을 전송하거나, 세션 컨텍스트를 설정하거나, 세션 컨텍스트를 복원합니다. 짧은 컨텍스트의 경우 정확한 이벤트 순서를 나타내기 위해 차례대로 상호작용을 보낼 수 있습니다.

Python

turns = [
    {"role": "user", "parts": [{"text": "What is the capital of France?"}]},
    {"role": "model", "parts": [{"text": "Paris"}]},
]

await session.send_client_content(turns=turns, turn_complete=False)

turns = [{"role": "user", "parts": [{"text": "What is the capital of Germany?"}]}]

await session.send_client_content(turns=turns, turn_complete=True)

자바스크립트

let inputTurns = [
  { "role": "user", "parts": [{ "text": "What is the capital of France?" }] },
  { "role": "model", "parts": [{ "text": "Paris" }] },
]

session.sendClientContent({ turns: inputTurns, turnComplete: false })

inputTurns = [{ "role": "user", "parts": [{ "text": "What is the capital of Germany?" }] }]

session.sendClientContent({ turns: inputTurns, turnComplete: true })

컨텍스트가 긴 경우 후속 상호작용을 위해 컨텍스트 윈도우를 확보할 수 있도록 단일 메시지 요약을 제공하는 것이 좋습니다. 세션 컨텍스트를 로드하는 다른 방법은 세션 재개를 참고하세요.

오디오 보내고 받기

가장 일반적인 오디오 예인 오디오-오디오는 시작하기 가이드에서 다룹니다.

다음은 WAV 파일을 읽고 올바른 형식으로 전송하며 텍스트 출력을 수신하는 오디오-텍스트 예입니다.

Python

# Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
# Install helpers for converting files: pip install librosa soundfile
import asyncio
import io
from pathlib import Path
from google import genai
from google.genai import types
import soundfile as sf
import librosa

client = genai.Client()
model = "gemini-live-2.5-flash-preview"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:

        buffer = io.BytesIO()
        y, sr = librosa.load("sample.wav", sr=16000)
        sf.write(buffer, y, sr, format='RAW', subtype='PCM_16')
        buffer.seek(0)
        audio_bytes = buffer.read()

        # If already in correct format, you can use this:
        # audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        async for response in session.receive():
            if response.text is not None:
                print(response.text)

if __name__ == "__main__":
    asyncio.run(main())

자바스크립트

// Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav
// Install helpers for converting files: npm install wavefile
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;

const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  // Send Audio Chunk
  const fileBuffer = fs.readFileSync("sample.wav");

  // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
  const wav = new WaveFile();
  wav.fromBuffer(fileBuffer);
  wav.toSampleRate(16000);
  wav.toBitDepth("16");
  const base64Audio = wav.toBase64();

  // If already in correct format, you can use this:
  // const fileBuffer = fs.readFileSync("sample.pcm");
  // const base64Audio = Buffer.from(fileBuffer).toString('base64');

  session.sendRealtimeInput(
    {
      audio: {
        data: base64Audio,
        mimeType: "audio/pcm;rate=16000"
      }
    }

  );

  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
  }

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

다음은 텍스트를 오디오로 변환하는 예입니다. AUDIO를 응답 모달리티로 설정하여 오디오를 수신할 수 있습니다. 이 예에서는 수신된 데이터를 WAV 파일로 저장합니다.

Python

import asyncio
import wave
from google import genai

client = genai.Client()
model = "gemini-live-2.5-flash-preview"

config = {"response_modalities": ["AUDIO"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        wf = wave.open("audio.wav", "wb")
        wf.setnchannels(1)
        wf.setsampwidth(2)
        wf.setframerate(24000)

        message = "Hello how are you?"
        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.data is not None:
                wf.writeframes(response.data)

            # Un-comment this code to print audio data info
            # if response.server_content.model_turn is not None:
            #      print(response.server_content.model_turn.parts[0].inline_data.mime_type)

        wf.close()

if __name__ == "__main__":
    asyncio.run(main())

자바스크립트

import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;

const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.AUDIO] };

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  const inputTurns = 'Hello how are you?';
  session.sendClientContent({ turns: inputTurns });

  const turns = await handleTurn();

  // Combine audio data strings and save as wave file
  const combinedAudio = turns.reduce((acc, turn) => {
    if (turn.data) {
      const buffer = Buffer.from(turn.data, 'base64');
      const intArray = new Int16Array(buffer.buffer, buffer.byteOffset, buffer.byteLength / Int16Array.BYTES_PER_ELEMENT);
      return acc.concat(Array.from(intArray));
    }
    return acc;
  }, []);

  const audioBuffer = new Int16Array(combinedAudio);

  const wf = new WaveFile();
  wf.fromScratch(1, 24000, '16', audioBuffer);
  fs.writeFileSync('output.wav', wf.toBuffer());

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

오디오 형식

Live API의 오디오 데이터는 항상 원시 리틀 엔디언 16비트 PCM입니다. 오디오 출력은 항상 24kHz의 샘플링 레이트를 사용합니다. 입력 오디오는 기본적으로 16kHz이지만 Live API는 필요한 경우 리샘플링하므로 모든 샘플링 레이트를 전송할 수 있습니다. 입력 오디오의 샘플링 속도를 전달하려면 각 오디오 포함 Blob의 MIME 유형을 audio/pcm;rate=16000와 같은 값으로 설정하세요.

오디오 스크립트

설정 구성에서 output_audio_transcription를 전송하여 모델의 오디오 출력 스크립트 작성을 사용 설정할 수 있습니다. 스크립트 작성 언어는 모델의 대답에서 추론됩니다.

Python

import asyncio
from google import genai
from google.genai import types

client = genai.Client()
model = "gemini-live-2.5-flash-preview"

config = {"response_modalities": ["AUDIO"],
        "output_audio_transcription": {}
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        message = "Hello? Gemini are you there?"

        await session.send_client_content(
            turns={"role": "user", "parts": [{"text": message}]}, turn_complete=True
        )

        async for response in session.receive():
            if response.server_content.model_turn:
                print("Model turn:", response.server_content.model_turn)
            if response.server_content.output_transcription:
                print("Transcript:", response.server_content.output_transcription.text)

if __name__ == "__main__":
    asyncio.run(main())

자바스크립트

import { GoogleGenAI, Modality } from '@google/genai';

const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';

const config = {
  responseModalities: [Modality.AUDIO],
  outputAudioTranscription: {}
};

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  const inputTurns = 'Hello how are you?';
  session.sendClientContent({ turns: inputTurns });

  const turns = await handleTurn();

  for (const turn of turns) {
    if (turn.serverContent && turn.serverContent.outputTranscription) {
      console.debug('Received output transcription: %s\n', turn.serverContent.outputTranscription.text);
    }
  }

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

설정 구성에서 input_audio_transcription를 전송하여 오디오 입력의 스크립트 작성을 사용 설정할 수 있습니다.

Python

import asyncio
from pathlib import Path
from google import genai
from google.genai import types

client = genai.Client()
model = "gemini-live-2.5-flash-preview"

config = {
    "response_modalities": ["TEXT"],
    "input_audio_transcription": {},
}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        audio_data = Path("16000.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_data, mime_type='audio/pcm;rate=16000')
        )

        async for msg in session.receive():
            if msg.server_content.input_transcription:
                print('Transcript:', msg.server_content.input_transcription.text)

if __name__ == "__main__":
    asyncio.run(main())

자바스크립트

import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";
import pkg from 'wavefile';
const { WaveFile } = pkg;

const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';

const config = {
  responseModalities: [Modality.TEXT],
  inputAudioTranscription: {}
};

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  // Send Audio Chunk
  const fileBuffer = fs.readFileSync("16000.wav");

  // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono)
  const wav = new WaveFile();
  wav.fromBuffer(fileBuffer);
  wav.toSampleRate(16000);
  wav.toBitDepth("16");
  const base64Audio = wav.toBase64();

  // If already in correct format, you can use this:
  // const fileBuffer = fs.readFileSync("sample.pcm");
  // const base64Audio = Buffer.from(fileBuffer).toString('base64');

  session.sendRealtimeInput(
    {
      audio: {
        data: base64Audio,
        mimeType: "audio/pcm;rate=16000"
      }
    }
  );

  const turns = await handleTurn();

  for (const turn of turns) {
    if (turn.serverContent && turn.serverContent.outputTranscription) {
      console.log("Transcription")
      console.log(turn.serverContent.outputTranscription.text);
    }
  }
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
    else if (turn.serverContent && turn.serverContent.inputTranscription) {
      console.debug('Received input transcription: %s\n', turn.serverContent.inputTranscription.text);
    }
  }

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

오디오 및 동영상 스트리밍

음성 및 언어 변경

Live API 모델은 각각 다른 음성을 지원합니다. 하프 캐스케이드에서는 Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr를 지원합니다. 네이티브 오디오는 훨씬 긴 목록을 지원합니다 (TTS 모델 목록과 동일). AI Studio에서 모든 음성을 들을 수 있습니다.

음성을 지정하려면 세션 구성의 일부로 speechConfig 객체 내에서 음성 이름을 설정합니다.

Python

config = {
    "response_modalities": ["AUDIO"],
    "speech_config": {
        "voice_config": {"prebuilt_voice_config": {"voice_name": "Kore"}}
    },
}

자바스크립트

const config = {
  responseModalities: [Modality.AUDIO],
  speechConfig: { voiceConfig: { prebuiltVoiceConfig: { voiceName: "Kore" } } }
};

Live API는 다국어를 지원합니다.

언어를 변경하려면 세션 구성의 일부로 speechConfig 객체 내에서 언어 코드를 설정합니다.

Python

config = {
    "response_modalities": ["AUDIO"],
    "speech_config": {
        "language_code": "de-DE"
    }
}

자바스크립트

const config = {
  responseModalities: [Modality.AUDIO],
  speechConfig: { languageCode: "de-DE" }
};

네이티브 오디오 기능

다음 기능은 네이티브 오디오에서만 사용할 수 있습니다. 모델 및 오디오 생성 선택에서 기본 오디오에 대해 자세히 알아볼 수 있습니다.

네이티브 오디오 출력 사용 방법

기본 오디오 출력을 사용하려면 기본 오디오 모델 중 하나를 구성하고 response_modalities를 AUDIO로 설정합니다.

전체 예시는 오디오 보내기 및 받기를 참고하세요.

Python

model = "gemini-2.5-flash-native-audio-preview-09-2025"
config = types.LiveConnectConfig(response_modalities=["AUDIO"])

async with client.aio.live.connect(model=model, config=config) as session:
    # Send audio input and receive audio

자바스크립트

const model = 'gemini-2.5-flash-native-audio-preview-09-2025';
const config = { responseModalities: [Modality.AUDIO] };

async function main() {

  const session = await ai.live.connect({
    model: model,
    config: config,
    callbacks: ...,
  });

  // Send audio input and receive audio

  session.close();
}

main();

공감형 대화

이 기능을 사용하면 Gemini가 입력된 표현과 말투에 맞게 대답 스타일을 조정할 수 있습니다.

정서적 대화를 사용하려면 API 버전을 v1alpha로 설정하고 설정 메시지에서 enable_affective_dialog을 true로 설정합니다.

Python

client = genai.Client(http_options={"api_version": "v1alpha"})

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    enable_affective_dialog=True
)

자바스크립트

const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });

const config = {
  responseModalities: [Modality.AUDIO],
  enableAffectiveDialog: true
};

정서적 대화는 현재 네이티브 오디오 출력 모델에서만 지원됩니다.

능동적 오디오

이 기능을 사용 설정하면 콘텐츠가 관련성이 없는 경우 Gemini가 선제적으로 대답하지 않기로 결정할 수 있습니다.

이를 사용하려면 API 버전을 v1alpha로 설정하고 설정 메시지에서 proactivity 필드를 구성하고 proactive_audio를 true로 설정합니다.

Python

client = genai.Client(http_options={"api_version": "v1alpha"})

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    proactivity={'proactive_audio': True}
)

자바스크립트

const ai = new GoogleGenAI({ httpOptions: {"apiVersion": "v1alpha"} });

const config = {
  responseModalities: [Modality.AUDIO],
  proactivity: { proactiveAudio: true }
}

사전 대응 오디오는 현재 네이티브 오디오 출력 모델에서만 지원됩니다.

사고

최신 네이티브 오디오 출력 모델 gemini-2.5-flash-native-audio-preview-09-2025는 사고 능력을 지원하며 동적 사고가 기본적으로 사용 설정되어 있습니다.

thinkingBudget 매개변수는 대답을 생성할 때 사용할 사고 토큰 수를 모델에 안내합니다. thinkingBudget를 0로 설정하여 생각을 사용 중지할 수 있습니다. 모델의 thinkingBudget 구성 세부정보에 관한 자세한 내용은 사고 예산 문서를 참고하세요.

Python

model = "gemini-2.5-flash-native-audio-preview-09-2025"

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"]
    thinking_config=types.ThinkingConfig(
        thinking_budget=1024,
    )
)

async with client.aio.live.connect(model=model, config=config) as session:
    # Send audio input and receive audio

자바스크립트

const model = 'gemini-2.5-flash-native-audio-preview-09-2025';
const config = {
  responseModalities: [Modality.AUDIO],
  thinkingConfig: {
    thinkingBudget: 1024,
  },
};

async function main() {

  const session = await ai.live.connect({
    model: model,
    config: config,
    callbacks: ...,
  });

  // Send audio input and receive audio

  session.close();
}

main();

또한 구성에서 includeThoughts을 true로 설정하여 생각 요약을 사용 설정할 수 있습니다. 자세한 내용은 생각 요약을 참고하세요.

Python

model = "gemini-2.5-flash-native-audio-preview-09-2025"

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"]
    thinking_config=types.ThinkingConfig(
        thinking_budget=1024,
        include_thoughts=True
    )
)

자바스크립트

const model = 'gemini-2.5-flash-native-audio-preview-09-2025';
const config = {
  responseModalities: [Modality.AUDIO],
  thinkingConfig: {
    thinkingBudget: 1024,
    includeThoughts: true,
  },
};

음성 활동 감지 (VAD)

음성 활동 감지 (VAD)를 사용하면 모델에서 사람이 말하는 시점을 인식할 수 있습니다. 사용자가 언제든지 모델을 중단할 수 있어 이는 자연스러운 대화를 만드는 데 필수적입니다.

VAD가 중단을 감지하면 진행 중인 생성이 취소되고 삭제됩니다. 클라이언트에 이미 전송된 정보만 세션 기록에 보관됩니다. 그러면 서버는 BidiGenerateContentServerContent 메시지를 전송하여 인터럽트를 보고합니다.

그러면 Gemini 서버는 대기 중인 함수 호출을 삭제하고 취소된 호출의 ID가 포함된 BidiGenerateContentServerContent 메시지를 전송합니다.

Python

async for response in session.receive():
    if response.server_content.interrupted is True:
        # The generation was interrupted

        # If realtime playback is implemented in your application,
        # you should stop playing audio and clear queued playback here.

자바스크립트

const turns = await handleTurn();

for (const turn of turns) {
  if (turn.serverContent && turn.serverContent.interrupted) {
    // The generation was interrupted

    // If realtime playback is implemented in your application,
    // you should stop playing audio and clear queued playback here.
  }
}

자동 VAD

기본적으로 모델은 연속 오디오 입력 스트림에서 VAD를 자동으로 실행합니다. VAD는 설정 구성의 realtimeInputConfig.automaticActivityDetection 필드를 사용하여 구성할 수 있습니다.

오디오 스트림이 1초 이상 일시중지되면 (예: 사용자가 마이크를 사용 중지한 경우) 캐시된 오디오를 플러시하기 위해 audioStreamEnd 이벤트가 전송되어야 합니다. 클라이언트는 언제든지 오디오 데이터 전송을 재개할 수 있습니다.

Python

# example audio file to try:
# URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
# !wget -q $URL -O sample.pcm
import asyncio
from pathlib import Path
from google import genai
from google.genai import types

client = genai.Client()
model = "gemini-live-2.5-flash-preview"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        audio_bytes = Path("sample.pcm").read_bytes()

        await session.send_realtime_input(
            audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
        )

        # if stream gets paused, send:
        # await session.send_realtime_input(audio_stream_end=True)

        async for response in session.receive():
            if response.text is not None:
                print(response.text)

if __name__ == "__main__":
    asyncio.run(main())

자바스크립트

// example audio file to try:
// URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm"
// !wget -q $URL -O sample.pcm
import { GoogleGenAI, Modality } from '@google/genai';
import * as fs from "node:fs";

const ai = new GoogleGenAI({});
const model = 'gemini-live-2.5-flash-preview';
const config = { responseModalities: [Modality.TEXT] };

async function live() {
  const responseQueue = [];

  async function waitMessage() {
    let done = false;
    let message = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message;
  }

  async function handleTurn() {
    const turns = [];
    let done = false;
    while (!done) {
      const message = await waitMessage();
      turns.push(message);
      if (message.serverContent && message.serverContent.turnComplete) {
        done = true;
      }
    }
    return turns;
  }

  const session = await ai.live.connect({
    model: model,
    callbacks: {
      onopen: function () {
        console.debug('Opened');
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug('Error:', e.message);
      },
      onclose: function (e) {
        console.debug('Close:', e.reason);
      },
    },
    config: config,
  });

  // Send Audio Chunk
  const fileBuffer = fs.readFileSync("sample.pcm");
  const base64Audio = Buffer.from(fileBuffer).toString('base64');

  session.sendRealtimeInput(
    {
      audio: {
        data: base64Audio,
        mimeType: "audio/pcm;rate=16000"
      }
    }

  );

  // if stream gets paused, send:
  // session.sendRealtimeInput({ audioStreamEnd: true })

  const turns = await handleTurn();
  for (const turn of turns) {
    if (turn.text) {
      console.debug('Received text: %s\n', turn.text);
    }
    else if (turn.data) {
      console.debug('Received inline data: %s\n', turn.data);
    }
  }

  session.close();
}

async function main() {
  await live().catch((e) => console.error('got error', e));
}

main();

send_realtime_input를 사용하면 API가 VAD에 따라 오디오에 자동으로 응답합니다. send_client_content는 순서대로 모델 컨텍스트에 메시지를 추가하는 반면 send_realtime_input는 결정론적 순서를 희생하여 응답성을 위해 최적화됩니다.

자동 VAD 구성

VAD 활동을 더 세부적으로 제어하려면 다음 매개변수를 구성하면 됩니다. 자세한 내용은 API 참조를 확인하세요.

Python

from google.genai import types

config = {
    "response_modalities": ["TEXT"],
    "realtime_input_config": {
        "automatic_activity_detection": {
            "disabled": False, # default
            "start_of_speech_sensitivity": types.StartSensitivity.START_SENSITIVITY_LOW,
            "end_of_speech_sensitivity": types.EndSensitivity.END_SENSITIVITY_LOW,
            "prefix_padding_ms": 20,
            "silence_duration_ms": 100,
        }
    }
}

자바스크립트

import { GoogleGenAI, Modality, StartSensitivity, EndSensitivity } from '@google/genai';

const config = {
  responseModalities: [Modality.TEXT],
  realtimeInputConfig: {
    automaticActivityDetection: {
      disabled: false, // default
      startOfSpeechSensitivity: StartSensitivity.START_SENSITIVITY_LOW,
      endOfSpeechSensitivity: EndSensitivity.END_SENSITIVITY_LOW,
      prefixPaddingMs: 20,
      silenceDurationMs: 100,
    }
  }
};

자동 VAD 사용 중지

또는 설정 메시지에서 realtimeInputConfig.automaticActivityDetection.disabled을 true로 설정하여 자동 VAD를 사용 중지할 수 있습니다. 이 구성에서 클라이언트는 사용자 음성을 감지하고 적절한 시간에 activityStart 및 activityEnd 메시지를 전송해야 합니다. 이 구성에서는 audioStreamEnd가 전송되지 않습니다. 대신 스트림의 모든 중단은 activityEnd 메시지로 표시됩니다.

Python

config = {
    "response_modalities": ["TEXT"],
    "realtime_input_config": {"automatic_activity_detection": {"disabled": True}},
}

async with client.aio.live.connect(model=model, config=config) as session:
    # ...
    await session.send_realtime_input(activity_start=types.ActivityStart())
    await session.send_realtime_input(
        audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000")
    )
    await session.send_realtime_input(activity_end=types.ActivityEnd())
    # ...

자바스크립트

const config = {
  responseModalities: [Modality.TEXT],
  realtimeInputConfig: {
    automaticActivityDetection: {
      disabled: true,
    }
  }
};

session.sendRealtimeInput({ activityStart: {} })

session.sendRealtimeInput(
  {
    audio: {
      data: base64Audio,
      mimeType: "audio/pcm;rate=16000"
    }
  }

);

session.sendRealtimeInput({ activityEnd: {} })

토큰 수

사용된 총 토큰 수는 반환된 서버 메시지의 usageMetadata 필드에서 확인할 수 있습니다.

Python

async for message in session.receive():
    # The server will periodically send messages that include UsageMetadata.
    if message.usage_metadata:
        usage = message.usage_metadata
        print(
            f"Used {usage.total_token_count} tokens in total. Response token breakdown:"
        )
        for detail in usage.response_tokens_details:
            match detail:
                case types.ModalityTokenCount(modality=modality, token_count=count):
                    print(f"{modality}: {count}")

자바스크립트

const turns = await handleTurn();

for (const turn of turns) {
  if (turn.usageMetadata) {
    console.debug('Used %s tokens in total. Response token breakdown:\n', turn.usageMetadata.totalTokenCount);

    for (const detail of turn.usageMetadata.responseTokensDetails) {
      console.debug('%s\n', detail);
    }
  }
}

미디어 해상도

세션 구성의 일부로 mediaResolution 필드를 설정하여 입력 미디어의 미디어 해상도를 지정할 수 있습니다.

Python

from google.genai import types

config = {
    "response_modalities": ["AUDIO"],
    "media_resolution": types.MediaResolution.MEDIA_RESOLUTION_LOW,
}

자바스크립트

import { GoogleGenAI, Modality, MediaResolution } from '@google/genai';

const config = {
    responseModalities: [Modality.TEXT],
    mediaResolution: MediaResolution.MEDIA_RESOLUTION_LOW,
};

제한사항

프로젝트를 계획할 때는 Live API의 다음 제한사항을 고려하세요.

대답 모달리티

세션 구성에서는 세션당 하나의 응답 모달리티 (TEXT 또는 AUDIO)만 설정할 수 있습니다. 두 가지를 모두 설정하면 구성 오류 메시지가 표시됩니다. 즉, 동일한 세션에서 텍스트 또는 오디오로 응답하도록 모델을 구성할 수 있지만 둘 다는 안 됩니다.

클라이언트 인증

Live API는 기본적으로 서버 간 인증만 제공합니다. 클라이언트-서버 접근 방식을 사용하여 Live API 애플리케이션을 구현하는 경우 일시적 토큰을 사용하여 보안 위험을 완화해야 합니다.

세션 시간

오디오 전용 세션은 15분으로 제한되며 오디오와 동영상 세션은 2분으로 제한됩니다. 하지만 세션 기간을 무제한으로 연장하기 위해 다양한 세션 관리 기법을 구성할 수 있습니다.

컨텍스트 윈도우

세션의 컨텍스트 윈도우 한도는 다음과 같습니다.

네이티브 오디오 출력 모델의 경우 128,000개 토큰
기타 Live API 모델의 경우 32,000개 토큰

지원 언어

Live API는 다음 언어를 지원합니다.

언어	BCP-47 코드	언어	BCP-47 코드
독일어(독일)	`de-DE`	영어 (오스트레일리아)*	`en-AU`
영어 (영국)*	`en-GB`	영어(인도)	`en-IN`
영어(미국)	`en-US`	스페인어(미국)	`es-US`
프랑스어(프랑스)	`fr-FR`	힌디어(인도)	`hi-IN`
포르투갈어(브라질)	`pt-BR`	아랍어(일반)	`ar-XA`
스페인어 (스페인)*	`es-ES`	프랑스어 (캐나다)*	`fr-CA`
인도네시아어(인도네시아)	`id-ID`	이탈리아어(이탈리아)	`it-IT`
일본어(일본)	`ja-JP`	터키어(터키)	`tr-TR`
베트남어(베트남)	`vi-VN`	벵골어(인도)	`bn-IN`
구자라트어 (인도)*	`gu-IN`	칸나다어 (인도)*	`kn-IN`
마라티어(인도)	`mr-IN`	말라얄람어 (인도)*	`ml-IN`
타밀어(인도)	`ta-IN`	텔루구어(인도)	`te-IN`
네덜란드어(네덜란드)	`nl-NL`	한국어(대한민국)	`ko-KR`
중국어 (중국)*	`cmn-CN`	폴란드어(폴란드)	`pl-PL`
러시아어(러시아)	`ru-RU`	태국어(태국)	`th-TH`

별표 (*)로 표시된 언어는 기본 오디오를 사용할 수 없습니다.

다음 단계

Live API를 효과적으로 사용하는 데 필요한 정보는 도구 사용 및 세션 관리 가이드를 참고하세요.
Google AI Studio에서 Live API를 사용해 보세요.
Live API 모델에 대한 자세한 내용은 모델 페이지의 Gemini 2.0 Flash Live 및 Gemini 2.5 Flash Native Audio를 참고하세요.
Live API 설명서, Live API 도구 설명서, Live API 시작 스크립트에서 더 많은 예를 확인해 보세요.