You can run Gemma models completely on-device with the MediaPipe LLM Inference API. The LLM Inference API acts as a wrapper for large language models, enabling you run Gemma models on-device for common text-to-text generation tasks like information retrieval, email drafting, and document summarization.
Try the LLM Inference API with MediaPipe Studio, a web-based application for evaluating and customizing on-device models.
The LLM Inference API is available on the following platforms:
To learn more, refer to the MediaPipe LLM Inference documentation.