Run Gemma with Kubernetes Engine

Google Cloud Kubernetes Engine provides a wide range of deployment options for running Gemma models with high performance and low latency using preferred development frameworks. Check out the following deployment guides for Hugging Face, vLLM, TensorRT-LLM on GPUs, and TPU execution with JetStream, plus application, and tuning guides:

Deploy and serve

Analyze data

Fine-tune