The LiteRT Next APIs are available in C++, and can offer Android developers greater control over memory allocation and low-level development than the Kotlin APIs.
For an example of a LiteRT Next application in C++, see the Asynchronous segmentation with C++ demo.
Get Started
Use the following steps to add LiteRT Next to your Android application.
Update the build configuration
Building a C++ application with LiteRT for GPU, NPU and CPU acceleration using
Bazel involves defining a cc_binary
rule to ensure all
necessary components are compiled, linked, and packaged. The following example
setup allows your application to dynamically choose or utilize GPU, NPU and CPU
accelerators.
Here are the key components in your Bazel build configuration:
cc_binary
Rule: This is the fundamental Bazel rule used to define your C++ executable target (e.g.,name = "your_application_name"
).srcs
Attribute: Lists your application's C++ source files (e.g.,main.cc
, and other.cc
or.h
files).data
Attribute (Runtime Dependencies): This is crucial for packaging shared libraries and assets that your application loads at runtime.- LiteRT Core Runtime: The main LiteRT C API shared library (e.g.,
//litert/c:litert_runtime_c_api_shared_lib
). - Dispatch Libraries: Vendor-specific shared libraries that LiteRT
uses to communicate with the hardware drivers (e.g.,
//litert/vendors/qualcomm/dispatch:dispatch_api_so
). - GPU Backend Libraries: The shared libraries for GPU acceleration
(e.g.,
"@litert_gpu//:jni/arm64-v8a/libLiteRtGpuAccelerator.so
). - NPU Backend Libraries: The specific shared libraries for NPU
acceleration, such as Qualcomm's QNN HTP libraries (e.g.,
@qairt//:lib/aarch64-android/libQnnHtp.so
,@qairt//:lib/hexagon-v79/unsigned/libQnnHtpV79Skel.so
). - Model Files & Assets: Your trained model files, test images,
shaders, or any other data needed at runtime (e.g.,
:model_files
,:shader_files
).
- LiteRT Core Runtime: The main LiteRT C API shared library (e.g.,
deps
Attribute (Compile-time Dependencies): This lists the libraries your code needs to compile against.- LiteRT APIs & Utilities: Headers and static libraries for LiteRT
components like tensor buffers (e.g.,
//litert/cc:litert_tensor_buffer
). - Graphics Libraries (for GPU): Dependencies related to graphics APIs
if the GPU accelerator uses them (e.g.,
gles_deps()
).
- LiteRT APIs & Utilities: Headers and static libraries for LiteRT
components like tensor buffers (e.g.,
linkopts
Attribute: Specifies options passed to the linker, which can include linking against system libraries (e.g.,-landroid
for Android builds, or GLES libraries withgles_linkopts()
).
The following is an example of a cc_binary
rule:
cc_binary(
name = "your_application",
srcs = [
"main.cc",
],
data = [
...
# litert c api shared library
"//litert/c:litert_runtime_c_api_shared_lib",
# GPU accelerator shared library
"@litert_gpu//:jni/arm64-v8a/libLiteRtGpuAccelerator.so",
# NPU accelerator shared library
"//litert/vendors/qualcomm/dispatch:dispatch_api_so",
],
linkopts = select({
"@org_tensorflow//tensorflow:android": ["-landroid"],
"//conditions:default": [],
}) + gles_linkopts(), # gles link options
deps = [
...
"//litert/cc:litert_tensor_buffer", # litert cc library
...
] + gles_deps(), # gles dependencies
)
Load the Model
After obtaining a LiteRT model, or converting a model into the .tflite
format,
load the model by creating a Model
object.
LITERT_ASSIGN_OR_RETURN(auto model, Model::CreateFromFile("mymodel.tflite"));
Create the environment
The Environment
object provides a runtime environment that includes components
such as the path of the compiler plugin and GPU contexts. The Environment
is
required when creating CompiledModel
and TensorBuffer
. The following code
creates an Environment
for CPU and GPU execution without any options:
LITERT_ASSIGN_OR_RETURN(auto env, Environment::Create({}));
Create the Compiled Model
Using the CompiledModel
API, initialize the runtime with the newly created
Model
object. You can specify the hardware acceleration at this point
(kLiteRtHwAcceleratorCpu
or kLiteRtHwAcceleratorGpu
):
LITERT_ASSIGN_OR_RETURN(auto compiled_model,
CompiledModel::Create(env, model, kLiteRtHwAcceleratorCpu));
Create Input and Output Buffers
Create the necessary data structures (buffers) to hold the input data that you will feed into the model for inference, and the output data that the model produces after running inference.
LITERT_ASSIGN_OR_RETURN(auto input_buffers, compiled_model.CreateInputBuffers());
LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model.CreateOutputBuffers());
If you are using CPU memory, fill the inputs by writing data directly into the first input buffer.
input_buffers[0].Write<float>(absl::MakeConstSpan(input_data, input_size));
Invoke the model
Providing the input and output buffers, run the Compiled Model with the model and hardware acceleration specified in previous steps.
compiled_model.Run(input_buffers, output_buffers);
Retrieve Outputs
Retrieve outputs by directly reading the model output from memory.
std::vector<float> data(output_data_size);
output_buffers[0].Read<float>(absl::MakeSpan(data));
// ... process output data
Key concepts and components
Refer to the following sections for information on key concepts and components of the LiteRT Next APIs.
Error Handling
LiteRT uses litert::Expected
to either return values or propagate errors in a
similar way to absl::StatusOr
or std::expected
. You can manually check for
the error yourself.
For convenience, LiteRT provides the following macros:
LITERT_ASSIGN_OR_RETURN(lhs, expr)
assigns the result ofexpr
tolhs
if it doesn't produce an error and otherwise returns the error.It will expand to something like the following snippet.
auto maybe_model = Model::CreateFromFile("mymodel.tflite"); if (!maybe_model) { return maybe_model.Error(); } auto model = std::move(maybe_model.Value());
LITERT_ASSIGN_OR_ABORT(lhs, expr)
does the same asLITERT_ASSIGN_OR_RETURN
but aborts the program in case of error.LITERT_RETURN_IF_ERROR(expr)
returnsexpr
if its evaluation produces an error.LITERT_ABORT_IF_ERROR(expr)
does the same asLITERT_RETURN_IF_ERROR
but aborts the program in case of error.
For more information on LiteRT macros, see litert_macros.h
.
Compiled Model (CompiledModel)
The Compiled Model API (CompiledModel
) is responsible for loading a model,
applying hardware acceleration, instantiating the runtime, creating input and
output buffers, and running inference.
The following simplified code snippet demonstrates how the Compiled Model API
takes a LiteRT model (.tflite
) and the target hardware accelerator (GPU), and
creates a compiled model that is ready to run inference.
// Load model and initialize runtime
LITERT_ASSIGN_OR_RETURN(auto model, Model::CreateFromFile("mymodel.tflite"));
LITERT_ASSIGN_OR_RETURN(auto env, Environment::Create({}));
LITERT_ASSIGN_OR_RETURN(auto compiled_model,
CompiledModel::Create(env, model, kLiteRtHwAcceleratorCpu));
The following simplified code snippet demonstrates how the Compiled Model API takes an input and output buffer, and runs inferences with the compiled model.
// Preallocate input/output buffers
LITERT_ASSIGN_OR_RETURN(auto input_buffers, compiled_model.CreateInputBuffers());
LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model.CreateOutputBuffers());
// Fill the first input
float input_values[] = { /* your data */ };
LITERT_RETURN_IF_ERROR(
input_buffers[0].Write<float>(absl::MakeConstSpan(input_values, /*size*/)));
// Invoke
LITERT_RETURN_IF_ERROR(compiled_model.Run(input_buffers, output_buffers));
// Read the output
std::vector<float> data(output_data_size);
LITERT_RETURN_IF_ERROR(
output_buffers[0].Read<float>(absl::MakeSpan(data)));
For a more complete view of how the CompiledModel
API is implemented, see the
source code for
litert_compiled_model.h.
Tensor Buffer (TensorBuffer)
LiteRT Next provides built-in support for I/O buffer interoperability, using the
Tensor Buffer API (TensorBuffer
) to handle the flow of data into and out of
the compiled model. The Tensor Buffer API provides the ability to write
(Write<T>()
) and read (Read<T>()
), and lock CPU memory.
For a more complete view of how the TensorBuffer
API is implemented, see the
source code for
litert_tensor_buffer.h.
Query model input/output requirements
The requirements for allocating a Tensor Buffer (TensorBuffer
) are typically
specified by the hardware accelerator. Buffers for inputs and outputs can have
requirements regarding alignment, buffer strides, and memory type. You can use
helper functions like CreateInputBuffers
to automatically handle these
requirements.
The following simplified code snippet demonstrates how you can retrieve the buffer requirements for input data:
LITERT_ASSIGN_OR_RETURN(auto reqs, compiled_model.GetInputBufferRequirements(signature_index, input_index));
For a more complete view of how the TensorBufferRequirements
API is
implemented, see the source code for
litert_tensor_buffer_requirements.h.
Create Managed Tensor Buffers (TensorBuffers)
The following simplified code snippet demonstrates how to create Managed Tensor
Buffers, where the TensorBuffer
API allocates the respective buffers:
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_cpu,
TensorBuffer::CreateManaged(env, /*buffer_type=*/kLiteRtTensorBufferTypeHostMemory,
ranked_tensor_type, buffer_size));
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_gl, TensorBuffer::CreateManaged(env,
/*buffer_type=*/kLiteRtTensorBufferTypeGlBuffer, ranked_tensor_type, buffer_size));
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_ahwb, TensorBuffer::CreateManaged(env,
/*buffer_type=*/kLiteRtTensorBufferTypeAhwb, ranked_tensor_type, buffer_size));
Create Tensor Buffers with zero-copy
To wrap an existing buffer as a Tensor Buffer (zero-copy), use the following code snippet:
// Create a TensorBuffer from host memory
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_host,
TensorBuffer::CreateFromHostMemory(env, ranked_tensor_type,
ptr_to_host_memory, buffer_size));
// Create a TensorBuffer from GlBuffer
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_gl,
TensorBuffer::CreateFromGlBuffer(env, ranked_tensor_type, gl_target, gl_id,
size_bytes, offset));
// Create a TensorBuffer from AHardware Buffer
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_ahwb,
TensorBuffer::CreateFromAhwb(env, ranked_tensor_type, ahardware_buffer, offset));
Reading and writing from Tensor Buffer
The following snippet demonstrates how you can read from an input buffer and write to an output buffer:
// Example of reading to input buffer:
std::vector<float> input_tensor_data = {1,2};
LITERT_ASSIGN_OR_RETURN(auto write_success,
input_tensor_buffer.Write<float>(absl::MakeConstSpan(input_tensor_data)));
if(write_success){
/* Continue after successful write... */
}
// Example of writing to output buffer:
std::vector<float> data(total_elements);
LITERT_ASSIGN_OR_RETURN(auto read_success,
output_tensor_buffer.Read<float>(absl::MakeSpan(data)));
if(read_success){
/* Continue after successful read */
}
Advanced: Zero-copy buffer interop for specialized hardware buffer types
Certain buffer types, such as AHardwareBuffer
, allow for interoperability with
other buffer types. For example, an OpenGL buffer can be created from an
AHardwareBuffer
with zero-copy. The following code-snippet shows an example:
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_ahwb,
TensorBuffer::CreateManaged(env, kLiteRtTensorBufferTypeAhwb,
ranked_tensor_type, buffer_size));
// Buffer interop: Get OpenGL buffer from AHWB,
// internally creating an OpenGL buffer backed by AHWB memory.
LITERT_ASSIGN_OR_RETURN(auto gl_buffer, tensor_buffer_ahwb.GetGlBuffer());
OpenCL buffers can also be created from AHardwareBuffer
:
LITERT_ASSIGN_OR_RETURN(auto cl_buffer, tensor_buffer_ahwb.GetOpenClMemory());
On mobile devices that support interoperability between OpenCL and OpenGL, CL buffers can be created from GL buffers:
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_gl,
TensorBuffer::CreateFromGlBuffer(env, ranked_tensor_type, gl_target, gl_id,
size_bytes, offset));
// Creates an OpenCL buffer from the OpenGL buffer, zero-copy.
LITERT_ASSIGN_OR_RETURN(auto cl_buffer, tensor_buffer_from_gl.GetOpenClMemory());
Example implementations
Refer to the following implementations of LiteRT Next in C++.
Basic Inference (CPU)
The following is a condensed version of the code snippets from the Get Started section. It is the simplest implementation of inference with LiteRT Next.
// Load model and initialize runtime
LITERT_ASSIGN_OR_RETURN(auto model, Model::CreateFromFile("mymodel.tflite"));
LITERT_ASSIGN_OR_RETURN(auto env, Environment::Create({}));
LITERT_ASSIGN_OR_RETURN(auto compiled_model, CompiledModel::Create(env, model,
kLiteRtHwAcceleratorCpu));
// Preallocate input/output buffers
LITERT_ASSIGN_OR_RETURN(auto input_buffers, compiled_model.CreateInputBuffers());
LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model.CreateOutputBuffers());
// Fill the first input
float input_values[] = { /* your data */ };
input_buffers[0].Write<float>(absl::MakeConstSpan(input_values, /*size*/));
// Invoke
compiled_model.Run(input_buffers, output_buffers);
// Read the output
std::vector<float> data(output_data_size);
output_buffers[0].Read<float>(absl::MakeSpan(data));
Zero-Copy with Host Memory
The LiteRT Next Compiled Model API reduces the friction of inference pipelines,
especially when dealing with multiple hardware backends and zero-copy flows. The
following code snippet uses the CreateFromHostMemory
method when creating the
input buffer, which uses zero-copy with host memory.
// Define an LiteRT environment to use existing EGL display and context.
const std::vector<Environment::Option> environment_options = {
{OptionTag::EglDisplay, user_egl_display},
{OptionTag::EglContext, user_egl_context}};
LITERT_ASSIGN_OR_RETURN(auto env,
Environment::Create(absl::MakeConstSpan(environment_options)));
// Load model1 and initialize runtime.
LITERT_ASSIGN_OR_RETURN(auto model1, Model::CreateFromFile("model1.tflite"));
LITERT_ASSIGN_OR_RETURN(auto compiled_model1, CompiledModel::Create(env, model1, kLiteRtHwAcceleratorGpu));
// Prepare I/O buffers. opengl_buffer is given outside from the producer.
LITERT_ASSIGN_OR_RETURN(auto tensor_type, model.GetInputTensorType("input_name0"));
// Create an input TensorBuffer based on tensor_type that wraps the given OpenGL Buffer.
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_opengl,
litert::TensorBuffer::CreateFromGlBuffer(env, tensor_type, opengl_buffer));
// Create an input event and attach it to the input buffer. Internally, it creates
// and inserts a fence sync object into the current EGL command queue.
LITERT_ASSIGN_OR_RETURN(auto input_event, Event::CreateManaged(env, LiteRtEventTypeEglSyncFence));
tensor_buffer_from_opengl.SetEvent(std::move(input_event));
std::vector<TensorBuffer> input_buffers;
input_buffers.push_back(std::move(tensor_buffer_from_opengl));
// Create an output TensorBuffer of the model1. It's also used as an input of the model2.
LITERT_ASSIGN_OR_RETURN(auto intermedidate_buffers, compiled_model1.CreateOutputBuffers());
// Load model2 and initialize runtime.
LITERT_ASSIGN_OR_RETURN(auto model2, Model::CreateFromFile("model2.tflite"));
LITERT_ASSIGN_OR_RETURN(auto compiled_model2, CompiledModel::Create(env, model2, kLiteRtHwAcceleratorGpu));
LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model2.CreateOutputBuffers());
compiled_model1.RunAsync(input_buffers, intermedidate_buffers);
compiled_model2.RunAsync(intermedidate_buffers, output_buffers);