Gemini 3 is here. Read the developer guide to get started with our most advanced model yet.

Live Music API - WebSockets API reference

Lyria RealTime music generation uses a persistent, bidirectional, low-latency streaming connection using WebSockets. In this section, you'll find additional details regarding the WebSockets API.

Sessions

A WebSocket connection establishes a session to keep a real-time communication with the model. After a client initiates a new connection the session can exchange messages with the server to:

Send prompts and controls to steer music generation.
Send music playback controls.
Receive audio chunks.

WebSocket connection

To start a session, connect to this websocket endpoint:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateMusic

Session configuration

The initial message after connection sets the model to use during the session.

See the following example configuration. Note that the name casing in SDKs may vary. You can look up the Python SDK configuration options here.


{
  "model": string
}

Send messages

To exchange messages over the WebSocket connection, the client must send a JSON object over an open WebSocket connection. The JSON object must have exactly one of the fields from the following object set:


{
  "setup": BidiGenerateMusicSetup,
  "client_content": BidiGenerateMusicClientContent,
  "music_generation_config": BidiGenerateMusicGenerationConfig,
  "playback_control": BidiGenerateMusicPlaybackControl
}

Supported client messages

See the supported client messages in the following table:

Message	Description
`BidiGenerateMusicSetup`	Session configuration to be sent only in the first message
`BidiGenerateMusicClientContent`	Weighted prompts as the model input
`BidiGenerateMusicGenerationConfig`	Configuration for music generation
`BidiGenerateMusicPlaybackControl`	Playback control signals for model generation

Receive messages

To receive messages from the server, listen for the WebSocket 'message' event, and then parse the result according to the definition of the supported server messages.

See the following:

async def receive_audio(session):
  """Example background task to process incoming audio."""
  while True:
    async for message in session.receive():
      audio_data = message.server_content.audio_chunks[0].data
      # Process audio...
      await asyncio.sleep(10**-12)

async with (
  client.aio.live.music.connect(model='models/lyria-realtime-exp') as session,
  asyncio.TaskGroup() as tg,
):
  # Set up task to receive server messages.
  tg.create_task(receive_audio(session))

  # Send initial prompts and config
  await session.set_weighted_prompts(
    prompts=[
      types.WeightedPrompt(text='minimal techno', weight=1.0),
    ]
  )
  await session.set_music_generation_config(
    config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
  )

  # Start streaming music
  await session.play()

Server messages include exactly one of the other fields from the BidiGenerateMusicServerMessage message. (The messageType union is not expressed in JSON so the field will appear at the top-level of the message.)

Messages and events

AudioChunk

Representation of an audio chunk.

Fields
Union field `content`. `content` can be only one of the following:
`data`	`bytes` Raw bytes of the audio chunk.
`mimeType`	`string` The MIME type of the content of the audio chunk, such as "audio/wav".
`sourceMetadata`	`SourceMetadata` Output only. Prompts and config used for generating this audio chunk.

SourceMetadata

Metadata about the input source used for generating this audio chunk.

Fields

clientContent

BidiGenerateMusicClientContent

Weighted prompts for generating this audio chunk.

musicGenerationConfig

BidiGenerateMusicGenerationConfig

Music generation config for generating this audio chunk.

BidiGenerateMusicClientContent

User input to start or steer the music.

Fields

Fields
`weightedPrompts[]`	`WeightedPrompt` Required. Weighted prompts as the model input.

weightedPrompts[]

WeightedPrompt

Required. Weighted prompts as the model input.

BidiGenerateMusicClientMessage

Messages sent by the client in the BidiGenerateMusic call.

Fields
Union field `messageType`. `messageType` can be only one of the following:
`setup`	`BidiGenerateMusicSetup` Optional. Session configuration sent only in the first client message.
`clientContent`	`BidiGenerateMusicClientContent` Optional. Weighted prompts and music generation configs as the input of music generation.
`musicGenerationConfig`	`BidiGenerateMusicGenerationConfig` Optional. Configuration for music generation.
`playbackControl`	`BidiGenerateMusicPlaybackControl` Optional. Playback control signal for the music generation.

BidiGenerateMusicFilteredPrompt

Filtered prompt with reason.

Fields

Fields
`filteredReason`	`string` Output only. The reason why the prompt was filtered.
Union field `prompt`. The prompt that was filtered. `prompt` can be only one of the following:
`text`	`string` Optional. Text prompt.

filteredReason

string

Output only. The reason why the prompt was filtered.

Union field prompt. The prompt that was filtered. prompt can be only one of the following:

text

string

Optional. Text prompt.

BidiGenerateMusicGenerationConfig

Configuration for music generation.

Fields
`temperature`	`float` Optional. Controls the variance in audio generation. Higher values produce higher variance. Range is [0.0, 3.0]. Default is 1.1.
`topK`	`int32` Optional. Controls how the model selects tokens for output. Samples the topK tokens with the highest probabilities. Range is [1, 1000]. Default is 40.
`seed`	`int32` Optional. Seeds audio generation. If not set, the request uses a randomly generated seed.
`guidance`	`float` Optional. Controls how closely the model follows prompts. Higher guidance follows more closely, but will make transitions more abrupt. Range is [0.0, 6.0]. Default is 4.0.
`bpm`	`int32` Optional. Beats per minute. Range is [60, 200].
`density`	`float` Optional. Density of sounds. Range is [0.0, 1.0].
`brightness`	`float` Optional. Higher value produces brighter audio. Range is [0.0, 1.0].
`scale`	`Scale` Optional. Scale of the generated music.
`muteBass`	`bool` Optional. The audio output should not contain bass.
`muteDrums`	`bool` Optional. The audio output should not contain drums.
`onlyBassAndDrums`	`bool` Optional. The audio output should only contain bass and drums.
`musicGenerationMode`	`MusicGenerationMode` Optional. The mode of music generation. Default is QUALITY.

MusicGenerationMode

Enums
`MUSIC_GENERATION_MODE_UNSPECIFIED`	This value is unused.
`QUALITY`	This mode steers text prompts to regions of latent space with higher quality music.
`DIVERSITY`	This mode steers text prompts to regions of latent space with a larger diversity of music.
`VOCALIZATION`	This mode steers text prompts to regions of latent space more likely to generate vocal music.

Scale

Scale of the generated music.

Enums
`SCALE_UNSPECIFIED`	Default value. This value is unused.
`C_MAJOR_A_MINOR`	C major or A minor
`D_FLAT_MAJOR_B_FLAT_MINOR`	D flat major or B flat minor
`D_MAJOR_B_MINOR`	D major or B minor
`E_FLAT_MAJOR_C_MINOR`	E flat major or C minor
`E_MAJOR_D_FLAT_MINOR`	E major or D flat minor
`F_MAJOR_D_MINOR`	F major or D minor
`G_FLAT_MAJOR_E_FLAT_MINOR`	G flat major or E flat minor
`G_MAJOR_E_MINOR`	G major or E minor
`A_FLAT_MAJOR_F_MINOR`	A flat major or F minor
`A_MAJOR_G_FLAT_MINOR`	A major or G flat minor
`B_FLAT_MAJOR_G_MINOR`	B flat major or G minor
`B_MAJOR_A_FLAT_MINOR`	B major or A flat minor

BidiGenerateMusicPlaybackControl

Playback control for the music generation.

Enums
`PLAYBACK_CONTROL_UNSPECIFIED`	This value is unused.
`PLAY`	Start generating the music.
`PAUSE`	Hold the music generation. Use PLAY to resume from the current position.
`STOP`	Stop the music generation and reset the context (prompts retained). Use PLAY to restart the music generation.
`RESET_CONTEXT`	Reset the context (prompts retained) without stopping the music generation.

BidiGenerateMusicServerContent

Incremental server update generated by the model in response to client messages.

Content is generated as quickly as possible, and not in real time. Clients may choose to buffer and play it out in real time.

Fields

Fields
`audioChunks[]`	`AudioChunk` Output only. Audio chunks that the model has generated.

audioChunks[]

AudioChunk

Output only. Audio chunks that the model has generated.

BidiGenerateMusicServerMessage

Response message for the BidiGenerateMusic call.

Fields
Union field `messageType`. The type of the message. `messageType` can be only one of the following:
`setupComplete`	`BidiGenerateMusicSetupComplete` Output only. Sent in response to a `BidiGenerateMusicSetup` message from the client when setup is complete.
`serverContent`	`BidiGenerateMusicServerContent` Output only. Content generated by the model in response to client messages.
`filteredPrompt`	`BidiGenerateMusicFilteredPrompt` Output only. Filtered prompt with reason.
`warning`	`string` Output only. The warning message from the server. Warnings won't terminate the stream.

BidiGenerateMusicSetup

Message to be sent in the first (and only in the first) BidiGenerateMusicClientMessage.

Clients should wait for a BidiGenerateMusicSetupComplete message before sending any additional messages.

Fields

Fields
`model`	`string` Required. The model's resource name. This serves as an ID for the model to use. Format: `models/{model}`

model

string

Required. The model's resource name. This serves as an ID for the model to use.

Format: models/{model}

BidiGenerateMusicSetupComplete

This type has no fields.

Sent in response to a BidiGenerateMusicSetup message from the client.

WeightedPrompt

Weighted prompt as the model input.

Fields

Fields
`weight`	`float` Required. Weight of the prompt. The weight is used to control the relative importance of the prompt. Higher weights are more important than lower weights. Weights of all weighted_prompts in this BidiGenerateMusicClientContent must not be all 0. Weights of all weighted_prompts in this BidiGenerateMusicClientContent message will be normalized.
Union field `prompt`. `prompt` can be only one of the following:
`text`	`string` Text prompt.

weight

float

Required. Weight of the prompt.

The weight is used to control the relative importance of the prompt. Higher weights are more important than lower weights.

Weights of all weighted_prompts in this BidiGenerateMusicClientContent must not be all 0. Weights of all weighted_prompts in this BidiGenerateMusicClientContent message will be normalized.

Union field prompt.

prompt can be only one of the following:

text

string

Text prompt.

More information on types

For more information on the types used by the API, see the Python SDK.