LiteRT benchmark tools measure and calculate statistics for the following important performance metrics:
- Initialization time
- Inference time of warmup state
- Inference time of steady state
- Memory usage during initialization time
- Overall memory usage
The CompiledModel benchmark tool is provided as a C++ binary,
benchmark_model. You can execute this tool from a shell command line on
Android, Linux, macOS, Windows, and embedded devices with GPU acceleration
enabled.
Download prebuilt benchmark binaries
Download the nightly prebuilt command-line binaries by following the links following:
Build benchmark binary from source
You can build the benchmark binary from source.
bazel build -c opt //litert/tools:benchmark_model
To build with Android NDK toolchain, you need to set up the build environment first by following this guide, or use the docker image as described in this guide.
bazel build -c opt --config=android_arm64 \
//litert/tools:benchmark_model
Run benchmark
To run benchmarks, execute the binary from the shell.
path/to/downloaded_or_built/benchmark_model \
--graph=your_model.tflite \
--num_threads=4
More parameter options can be found in the source code of benchmark_model.
Benchmark GPU acceleration
These prebuilt binaries include LiteRT GPU Accelerator. It supports
- Android: OpenCL
- Linux: OpenCL and WebGPU (backed by Vulkan)
- macOS: Metal
- Windows: WebGPU (backed by Direct3D)
To use the GPU Accelerator, pass the flag --use_gpu=true.
Profile model ops
The benchmark model binary also let you profile model ops and get the
execution times of each operator. To do this, pass the flag
--use_profiler=true to benchmark_model during invocation.