Run LiteRT Next on Android with C++

The LiteRT Next APIs are available in C++, and can offer Android developers greater control over memory allocation and low-level development than the Kotlin APIs.

For an example of a LiteRT Next application in C++, see the Asynchronous segmentation with C++ demo.

Get Started

Use the following steps to add LiteRT Next to your Android application.

Update the build configuration

Building a C++ application with LiteRT for GPU, NPU and CPU acceleration using Bazel involves defining a cc_binary rule to ensure all necessary components are compiled, linked, and packaged. The following example setup allows your application to dynamically choose or utilize GPU, NPU and CPU accelerators.

Here are the key components in your Bazel build configuration:

  • cc_binary Rule: This is the fundamental Bazel rule used to define your C++ executable target (e.g., name = "your_application_name").
  • srcs Attribute: Lists your application's C++ source files (e.g., main.cc, and other .cc or .h files).
  • data Attribute (Runtime Dependencies): This is crucial for packaging shared libraries and assets that your application loads at runtime.
    • LiteRT Core Runtime: The main LiteRT C API shared library (e.g., //litert/c:litert_runtime_c_api_shared_lib).
    • Dispatch Libraries: Vendor-specific shared libraries that LiteRT uses to communicate with the hardware drivers (e.g., //litert/vendors/qualcomm/dispatch:dispatch_api_so).
    • GPU Backend Libraries: The shared libraries for GPU acceleration (e.g., "@litert_gpu//:jni/arm64-v8a/libLiteRtGpuAccelerator.so).
    • NPU Backend Libraries: The specific shared libraries for NPU acceleration, such as Qualcomm's QNN HTP libraries (e.g., @qairt//:lib/aarch64-android/libQnnHtp.so, @qairt//:lib/hexagon-v79/unsigned/libQnnHtpV79Skel.so).
    • Model Files & Assets: Your trained model files, test images, shaders, or any other data needed at runtime (e.g., :model_files, :shader_files).
  • deps Attribute (Compile-time Dependencies): This lists the libraries your code needs to compile against.
    • LiteRT APIs & Utilities: Headers and static libraries for LiteRT components like tensor buffers (e.g., //litert/cc:litert_tensor_buffer).
    • Graphics Libraries (for GPU): Dependencies related to graphics APIs if the GPU accelerator uses them (e.g., gles_deps()).
  • linkopts Attribute: Specifies options passed to the linker, which can include linking against system libraries (e.g., -landroid for Android builds, or GLES libraries with gles_linkopts()).

The following is an example of a cc_binary rule:

cc_binary(
    name = "your_application",
    srcs = [
        "main.cc",
    ],
    data = [
        ...
        # litert c api shared library
        "//litert/c:litert_runtime_c_api_shared_lib",
        # GPU accelerator shared library
        "@litert_gpu//:jni/arm64-v8a/libLiteRtGpuAccelerator.so",
        # NPU accelerator shared library
        "//litert/vendors/qualcomm/dispatch:dispatch_api_so",
    ],
    linkopts = select({
        "@org_tensorflow//tensorflow:android": ["-landroid"],
        "//conditions:default": [],
    }) + gles_linkopts(), # gles link options
    deps = [
        ...
        "//litert/cc:litert_tensor_buffer", # litert cc library
        ...
    ] + gles_deps(), # gles dependencies
)

Load the Model

After obtaining a LiteRT model, or converting a model into the .tflite format, load the model by creating a Model object.

LITERT_ASSIGN_OR_RETURN(auto model, Model::CreateFromFile("mymodel.tflite"));

Create the environment

The Environment object provides a runtime environment that includes components such as the path of the compiler plugin and GPU contexts. The Environment is required when creating CompiledModel and TensorBuffer. The following code creates an Environment for CPU and GPU execution without any options:

LITERT_ASSIGN_OR_RETURN(auto env, Environment::Create({}));

Create the Compiled Model

Using the CompiledModel API, initialize the runtime with the newly created Model object. You can specify the hardware acceleration at this point (kLiteRtHwAcceleratorCpu or kLiteRtHwAcceleratorGpu):

LITERT_ASSIGN_OR_RETURN(auto compiled_model,
  CompiledModel::Create(env, model, kLiteRtHwAcceleratorCpu));

Create Input and Output Buffers

Create the necessary data structures (buffers) to hold the input data that you will feed into the model for inference, and the output data that the model produces after running inference.

LITERT_ASSIGN_OR_RETURN(auto input_buffers, compiled_model.CreateInputBuffers());
LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model.CreateOutputBuffers());

If you are using CPU memory, fill the inputs by writing data directly into the first input buffer.

input_buffers[0].Write<float>(absl::MakeConstSpan(input_data, input_size));

Invoke the model

Providing the input and output buffers, run the Compiled Model with the model and hardware acceleration specified in previous steps.

compiled_model.Run(input_buffers, output_buffers);

Retrieve Outputs

Retrieve outputs by directly reading the model output from memory.

std::vector<float> data(output_data_size);
output_buffers[0].Read<float>(absl::MakeSpan(data));
// ... process output data

Key concepts and components

Refer to the following sections for information on key concepts and components of the LiteRT Next APIs.

Error Handling

LiteRT uses litert::Expected to either return values or propagate errors in a similar way to absl::StatusOr or std::expected. You can manually check for the error yourself.

For convenience, LiteRT provides the following macros:

  • LITERT_ASSIGN_OR_RETURN(lhs, expr) assigns the result of expr to lhs if it doesn't produce an error and otherwise returns the error.

    It will expand to something like the following snippet.

    auto maybe_model = Model::CreateFromFile("mymodel.tflite");
    if (!maybe_model) {
      return maybe_model.Error();
    }
    auto model = std::move(maybe_model.Value());
    
  • LITERT_ASSIGN_OR_ABORT(lhs, expr) does the same as LITERT_ASSIGN_OR_RETURN but aborts the program in case of error.

  • LITERT_RETURN_IF_ERROR(expr) returns expr if its evaluation produces an error.

  • LITERT_ABORT_IF_ERROR(expr) does the same as LITERT_RETURN_IF_ERROR but aborts the program in case of error.

For more information on LiteRT macros, see litert_macros.h.

Compiled Model (CompiledModel)

The Compiled Model API (CompiledModel) is responsible for loading a model, applying hardware acceleration, instantiating the runtime, creating input and output buffers, and running inference.

The following simplified code snippet demonstrates how the Compiled Model API takes a LiteRT model (.tflite) and the target hardware accelerator (GPU), and creates a compiled model that is ready to run inference.

// Load model and initialize runtime
LITERT_ASSIGN_OR_RETURN(auto model, Model::CreateFromFile("mymodel.tflite"));
LITERT_ASSIGN_OR_RETURN(auto env, Environment::Create({}));
LITERT_ASSIGN_OR_RETURN(auto compiled_model,
  CompiledModel::Create(env, model, kLiteRtHwAcceleratorCpu));

The following simplified code snippet demonstrates how the Compiled Model API takes an input and output buffer, and runs inferences with the compiled model.

// Preallocate input/output buffers
LITERT_ASSIGN_OR_RETURN(auto input_buffers, compiled_model.CreateInputBuffers());
LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model.CreateOutputBuffers());

// Fill the first input
float input_values[] = { /* your data */ };
LITERT_RETURN_IF_ERROR(
  input_buffers[0].Write<float>(absl::MakeConstSpan(input_values, /*size*/)));

// Invoke
LITERT_RETURN_IF_ERROR(compiled_model.Run(input_buffers, output_buffers));

// Read the output
std::vector<float> data(output_data_size);
LITERT_RETURN_IF_ERROR(
  output_buffers[0].Read<float>(absl::MakeSpan(data)));

For a more complete view of how the CompiledModel API is implemented, see the source code for litert_compiled_model.h.

Tensor Buffer (TensorBuffer)

LiteRT Next provides built-in support for I/O buffer interoperability, using the Tensor Buffer API (TensorBuffer) to handle the flow of data into and out of the compiled model. The Tensor Buffer API provides the ability to write (Write<T>()) and read (Read<T>()), and lock CPU memory.

For a more complete view of how the TensorBuffer API is implemented, see the source code for litert_tensor_buffer.h.

Query model input/output requirements

The requirements for allocating a Tensor Buffer (TensorBuffer) are typically specified by the hardware accelerator. Buffers for inputs and outputs can have requirements regarding alignment, buffer strides, and memory type. You can use helper functions like CreateInputBuffers to automatically handle these requirements.

The following simplified code snippet demonstrates how you can retrieve the buffer requirements for input data:

LITERT_ASSIGN_OR_RETURN(auto reqs, compiled_model.GetInputBufferRequirements(signature_index, input_index));

For a more complete view of how the TensorBufferRequirements API is implemented, see the source code for litert_tensor_buffer_requirements.h.

Create Managed Tensor Buffers (TensorBuffers)

The following simplified code snippet demonstrates how to create Managed Tensor Buffers, where the TensorBuffer API allocates the respective buffers:

LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_cpu,
TensorBuffer::CreateManaged(env, /*buffer_type=*/kLiteRtTensorBufferTypeHostMemory,
  ranked_tensor_type, buffer_size));

LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_gl, TensorBuffer::CreateManaged(env,
  /*buffer_type=*/kLiteRtTensorBufferTypeGlBuffer, ranked_tensor_type, buffer_size));

LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_ahwb, TensorBuffer::CreateManaged(env,
  /*buffer_type=*/kLiteRtTensorBufferTypeAhwb, ranked_tensor_type, buffer_size));

Create Tensor Buffers with zero-copy

To wrap an existing buffer as a Tensor Buffer (zero-copy), use the following code snippet:

// Create a TensorBuffer from host memory
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_host,
  TensorBuffer::CreateFromHostMemory(env, ranked_tensor_type,
  ptr_to_host_memory, buffer_size));

// Create a TensorBuffer from GlBuffer
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_gl,
  TensorBuffer::CreateFromGlBuffer(env, ranked_tensor_type, gl_target, gl_id,
  size_bytes, offset));

// Create a TensorBuffer from AHardware Buffer
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_ahwb,
  TensorBuffer::CreateFromAhwb(env, ranked_tensor_type, ahardware_buffer, offset));

Reading and writing from Tensor Buffer

The following snippet demonstrates how you can read from an input buffer and write to an output buffer:

// Example of reading to input buffer:
std::vector<float> input_tensor_data = {1,2};
LITERT_ASSIGN_OR_RETURN(auto write_success,
  input_tensor_buffer.Write<float>(absl::MakeConstSpan(input_tensor_data)));
if(write_success){
  /* Continue after successful write... */
}

// Example of writing to output buffer:
std::vector<float> data(total_elements);
LITERT_ASSIGN_OR_RETURN(auto read_success,
  output_tensor_buffer.Read<float>(absl::MakeSpan(data)));
if(read_success){
  /* Continue after successful read */
}

Advanced: Zero-copy buffer interop for specialized hardware buffer types

Certain buffer types, such as AHardwareBuffer, allow for interoperability with other buffer types. For example, an OpenGL buffer can be created from an AHardwareBuffer with zero-copy. The following code-snippet shows an example:

LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_ahwb,
  TensorBuffer::CreateManaged(env, kLiteRtTensorBufferTypeAhwb,
  ranked_tensor_type, buffer_size));
// Buffer interop: Get OpenGL buffer from AHWB,
// internally creating an OpenGL buffer backed by AHWB memory.
LITERT_ASSIGN_OR_RETURN(auto gl_buffer, tensor_buffer_ahwb.GetGlBuffer());

OpenCL buffers can also be created from AHardwareBuffer:

LITERT_ASSIGN_OR_RETURN(auto cl_buffer, tensor_buffer_ahwb.GetOpenClMemory());

On mobile devices that support interoperability between OpenCL and OpenGL, CL buffers can be created from GL buffers:

LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_gl,
  TensorBuffer::CreateFromGlBuffer(env, ranked_tensor_type, gl_target, gl_id,
  size_bytes, offset));

// Creates an OpenCL buffer from the OpenGL buffer, zero-copy.
LITERT_ASSIGN_OR_RETURN(auto cl_buffer, tensor_buffer_from_gl.GetOpenClMemory());

Example implementations

Refer to the following implementations of LiteRT Next in C++.

Basic Inference (CPU)

The following is a condensed version of the code snippets from the Get Started section. It is the simplest implementation of inference with LiteRT Next.

// Load model and initialize runtime
LITERT_ASSIGN_OR_RETURN(auto model, Model::CreateFromFile("mymodel.tflite"));
LITERT_ASSIGN_OR_RETURN(auto env, Environment::Create({}));
LITERT_ASSIGN_OR_RETURN(auto compiled_model, CompiledModel::Create(env, model,
  kLiteRtHwAcceleratorCpu));

// Preallocate input/output buffers
LITERT_ASSIGN_OR_RETURN(auto input_buffers, compiled_model.CreateInputBuffers());
LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model.CreateOutputBuffers());

// Fill the first input
float input_values[] = { /* your data */ };
input_buffers[0].Write<float>(absl::MakeConstSpan(input_values, /*size*/));

// Invoke
compiled_model.Run(input_buffers, output_buffers);

// Read the output
std::vector<float> data(output_data_size);
output_buffers[0].Read<float>(absl::MakeSpan(data));

Zero-Copy with Host Memory

The LiteRT Next Compiled Model API reduces the friction of inference pipelines, especially when dealing with multiple hardware backends and zero-copy flows. The following code snippet uses the CreateFromHostMemory method when creating the input buffer, which uses zero-copy with host memory.

// Define an LiteRT environment to use existing EGL display and context.
const std::vector<Environment::Option> environment_options = {
   {OptionTag::EglDisplay, user_egl_display},
   {OptionTag::EglContext, user_egl_context}};
LITERT_ASSIGN_OR_RETURN(auto env,
   Environment::Create(absl::MakeConstSpan(environment_options)));

// Load model1 and initialize runtime.
LITERT_ASSIGN_OR_RETURN(auto model1, Model::CreateFromFile("model1.tflite"));
LITERT_ASSIGN_OR_RETURN(auto compiled_model1, CompiledModel::Create(env, model1, kLiteRtHwAcceleratorGpu));

// Prepare I/O buffers. opengl_buffer is given outside from the producer.
LITERT_ASSIGN_OR_RETURN(auto tensor_type, model.GetInputTensorType("input_name0"));
// Create an input TensorBuffer based on tensor_type that wraps the given OpenGL Buffer.
LITERT_ASSIGN_OR_RETURN(auto tensor_buffer_from_opengl,
    litert::TensorBuffer::CreateFromGlBuffer(env, tensor_type, opengl_buffer));

// Create an input event and attach it to the input buffer. Internally, it creates
// and inserts a fence sync object into the current EGL command queue.
LITERT_ASSIGN_OR_RETURN(auto input_event, Event::CreateManaged(env, LiteRtEventTypeEglSyncFence));
tensor_buffer_from_opengl.SetEvent(std::move(input_event));

std::vector<TensorBuffer> input_buffers;
input_buffers.push_back(std::move(tensor_buffer_from_opengl));

// Create an output TensorBuffer of the model1. It's also used as an input of the model2.
LITERT_ASSIGN_OR_RETURN(auto intermedidate_buffers,  compiled_model1.CreateOutputBuffers());

// Load model2 and initialize runtime.
LITERT_ASSIGN_OR_RETURN(auto model2, Model::CreateFromFile("model2.tflite"));
LITERT_ASSIGN_OR_RETURN(auto compiled_model2, CompiledModel::Create(env, model2, kLiteRtHwAcceleratorGpu));
LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model2.CreateOutputBuffers());

compiled_model1.RunAsync(input_buffers, intermedidate_buffers);
compiled_model2.RunAsync(intermedidate_buffers, output_buffers);