Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

NPU acceleration with LiteRT Next

LiteRT Next provides a unified interface to use Neural Processing Units (NPUs) without forcing you to individually navigate vendor-specific compilers, runtimes, or library dependencies. Using LiteRT Next for NPU acceleration avoids many vendor-specific or device-specific complications, boosts performance for real-time and large-model inference, and minimizes memory copies through zero-copy hardware buffer usage.

If you are already, enrolled in the LiteRT NPU Early Access Program, sign in to the authorized account to view the NPU documentation. If you have not enrolled, sign up to the Early Access Program:

Get Started

To get started, see the NPU overview guide:

For classical ML models, proceed directly with the core framework:
- NPU acceleration with LiteRT Next
For Large Language Models (LLMs), we recommend using our LiteRT-LM framework to handle the required end-to-end processing for NPU execution:
- NPU acceleration with LiteRT-LM

For example implementations of LiteRT Next with NPU support, refer to the following demo applications:

NPU Vendors

LiteRT Next supports NPU acceleration with the following vendors: