Google AI Edge Portal 隆重推出：大規模基準測試 Edge AI。申請在非公開預先發布版期間要求存取權。

本頁面由 Cloud Translation API 翻譯而成。

開始使用 LiteRT

本指南將說明執行 LiteRT (精簡版簡稱 Lite) 的程序執行階段) 會在裝置端模型，根據輸入資料進行預測。這項功能是透過 LiteRT 解譯器達成的，該解譯器會使用靜態圖形排序和自訂 (較不具動態性) 記憶體配置器，確保負載、初始化和執行延遲時間降至最低。

LiteRT 推論通常遵循以下步驟：

載入模型：將 .tflite 模型載入記憶體，其中包含模型的執行圖表。
轉換資料：將輸入資料轉換為預期的格式和維度。模型的原始輸入資料通常與模型預期的輸入資料格式不符。舉例來說，以某個圖像或變更圖片格式，以便與模型相容
執行推論：執行 LiteRT 模型來進行預測。這個您需要使用 LiteRT API 來執行模型這項作業涉及幾個步驟，例如建構轉譯器和分配張量。
解讀輸出：以有意義的方式解讀輸出張量以打造實用又實用的應用程式舉例來說，模型可能只會傳回機率清單您可以自行將機率對應至相關類別，並設定輸出格式。

本指南說明如何存取 LiteRT 直譯器，並執行以及使用 C++、Java 和 Python 進行推論

支援的平台

TensorFlow 推論 API 支援多種程式語言，可用於多種程式語言，提供給 Android、iOS 和 Linux 等最常見的行動裝置和嵌入式平台。

在大多數情況下，API 設計會優先考量效能，而非易用性。LiteRT 的設計目的是在小型裝置上快速推論，因此 API 會避免不必要的複製，但犧牲了便利性。

LiteRT API 能讓您載入模型、動態饋給輸入內容擷取推論輸出內容

Android 平台

在 Android 上，可使用 Java 或 C++ API 執行 LiteRT 推論。 Java API 不僅方便，而且可直接在 Android 中使用活動類別。C++ API 提供更多彈性和速度，但可能會需要編寫 JNI 包裝函式，在 Java 和 C++ 層之間移動資料。

詳情請參閱 C++ 和 Java 區段，或請按照 Android 快速入門導覽課程操作。

iOS 平台

在 iOS 上，LiteRT 可用於 Swift 和 Objective-C iOS 程式庫。您也可以直接在 Objective-C 程式碼中使用 C API。

詳情請參閱 Swift、Objective-C 和 C API ，或按照 iOS 快速入門導覽課程的說明操作。

Linux 平台

在 Linux 平台上，您可以使用下列提供的 LiteRT API 執行推論： C++.

載入及執行模型

載入及執行 LiteRT 模型的步驟如下：

將模型載入記憶體。
根據現有模型建構 Interpreter。
設定輸入張量值。
叫用推論。
輸出張量值。

Android (Java)

用於透過 LiteRT 執行推論的 Java API 主要用於，因此可用做 Android 程式庫依附元件： com.google.ai.edge.litert。

在 Java 中，您會使用 Interpreter 類別載入模型並驅動模型推論。在許多情況下，這可能就是您需要的唯一 API。

您可以使用 FlatBuffers (.tflite) 檔案初始化 Interpreter：

public Interpreter(@NotNull File modelFile);

或是使用 MappedByteBuffer：

public Interpreter(@NotNull MappedByteBuffer mappedByteBuffer);

無論是哪種情況，您都必須提供有效的 LiteRT 模型，否則 API 會擲回 IllegalArgumentException。如果您使用 MappedByteBuffer 初始化 Interpreter，則必須在整個生命週期中保持不變 Interpreter。

如要在模型上執行推論，建議您使用簽章。適用於從 Tensorflow 2.5 開始轉換的模型

try (Interpreter interpreter = new Interpreter(file_of_tensorflowlite_model)) {
  Map<String, Object> inputs = new HashMap<>();
  inputs.put("input_1", input1);
  inputs.put("input_2", input2);
  Map<String, Object> outputs = new HashMap<>();
  outputs.put("output_1", output1);
  interpreter.runSignature(inputs, outputs, "mySignature");
}

runSignature 方法採用三個引數：

Inputs：將簽章中的輸入名稱對應至輸入物件。
輸出內容：從簽章中的輸出名稱對應至輸出資料的輸出對應。
Signature Name (選用)：簽名名稱 (如果模型只有單一簽名，可以留空)。

在模型沒有已定義的特徵碼的情況下，另一種執行推論的方式。只要呼叫 Interpreter.run() 即可。例如：

try (Interpreter interpreter = new Interpreter(file_of_a_tensorflowlite_model)) {
  interpreter.run(input, output);
}

run() 方法僅接受一個輸入內容，且只會傳回一項輸出內容。如果您的模型含有多個輸入或多個輸出內容，請改用：

interpreter.runForMultipleInputsOutputs(inputs, map_of_indices_to_outputs);

在這個範例中，inputs 中的每個項目都會對應至一個輸入張量， map_of_indices_to_outputs 會將輸出張量的索引對應到相應的輸出資料

在這兩種情況下，張量索引應對應至您提供給 LiteRT Converter。注意事項 input 中的張量順序必須與提供給 LiteRT 的順序相符轉換者。

Interpreter 類別也提供方便的函式，可讓您使用作業名稱取得任何模型輸入或輸出的索引：

public int getInputIndex(String opName);
public int getOutputIndex(String opName);

如果 opName 不是模型中的有效作業，則會擲回 IllegalArgumentException。

此外請注意，Interpreter 擁有資源。為避免記憶體流失，資源必須在使用後釋放：

interpreter.close();

如需使用 Java 的範例專案，請參閱 Android 物件偵測範例應用程式。

支援的資料類型

如要使用 LiteRT，輸入和輸出張量的資料類型必須是下列原始類型之一：

float
int
long
byte

也支援 String 類型，但其編碼方式與原始類型具體來說，字串張量的形狀會決定張量中的字串數量和排列方式，其中每個元素本身都是可變長度字串。因此，Tensor 的 (位元組) 大小不能只根據形狀和類型計算，因此字串不可做為單一固定 ByteBuffer 引數提供。

如果使用其他資料類型 (包括 Integer 和 Float 等封裝類型)，系統會擲回 IllegalArgumentException。

輸入

每個輸入項目都應為支援的原始類型陣列或多維陣列，或是適當大小的原始 ByteBuffer。如果輸入陣列或多維陣列，相關輸入張量就會在推論期間，自動調整為陣列的尺寸。如果輸入 ByteBuffer 而言，呼叫端應先手動調整關聯的輸入內容大小張量 (透過 Interpreter.resizeInput()) 執行推論。

使用 ByteBuffer 時，建議使用直接位元組緩衝區，因為這樣能 Interpreter，避免不必要的副本。如果 ByteBuffer 是直接位元組緩衝區，順序必須是 ByteOrder.nativeOrder()。用於模型推斷後，必須維持不變，直到模型推斷完成為止。

輸出內容

每個輸出內容應為支援項目陣列或多維陣列原始型別，或適當大小的 ByteBuffer。請注意，部分模型具有動態輸出內容，其中輸出張量的形狀可能會因輸入內容而異。使用現有的現有資源 Java ference API，而預定的擴充功能就能做到這一點。

iOS (Swift)

Swift API 可透過 Cocoapods 的 TensorFlowLiteSwift Pod 取得。

首先，您需要匯入 TensorFlowLite 模組。

import TensorFlowLite

// Getting model path
guard
  let modelPath = Bundle.main.path(forResource: "model", ofType: "tflite")
else {
  // Error handling...
}

do {
  // Initialize an interpreter with the model.
  let interpreter = try Interpreter(modelPath: modelPath)

  // Allocate memory for the model's input `Tensor`s.
  try interpreter.allocateTensors()

  let inputData: Data  // Should be initialized

  // input data preparation...

  // Copy the input data to the input `Tensor`.
  try self.interpreter.copy(inputData, toInputAt: 0)

  // Run inference by invoking the `Interpreter`.
  try self.interpreter.invoke()

  // Get the output `Tensor`
  let outputTensor = try self.interpreter.output(at: 0)

  // Copy output to `Data` to process the inference results.
  let outputSize = outputTensor.shape.dimensions.reduce(1, {x, y in x * y})
  let outputData =
        UnsafeMutableBufferPointer<Float32>.allocate(capacity: outputSize)
  outputTensor.data.copyBytes(to: outputData)

  if (error != nil) { /* Error handling... */ }
} catch error {
  // Error handling...
}

iOS (Objective-C)

Objective-C API 可從 Cocoapods 的 LiteRTObjC Pod 取得。

首先，您需要匯入 TensorFlowLiteObjC 模組。

@import TensorFlowLite;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"model"
                                                      ofType:@"tflite"];
NSError *error;

// Initialize an interpreter with the model.
TFLInterpreter *interpreter = [[TFLInterpreter alloc] initWithModelPath:modelPath
                                                                  error:&error];
if (error != nil) { /* Error handling... */ }

// Allocate memory for the model's input `TFLTensor`s.
[interpreter allocateTensorsWithError:&error];
if (error != nil) { /* Error handling... */ }

NSMutableData *inputData;  // Should be initialized
// input data preparation...

// Get the input `TFLTensor`
TFLTensor *inputTensor = [interpreter inputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy the input data to the input `TFLTensor`.
[inputTensor copyData:inputData error:&error];
if (error != nil) { /* Error handling... */ }

// Run inference by invoking the `TFLInterpreter`.
[interpreter invokeWithError:&error];
if (error != nil) { /* Error handling... */ }

// Get the output `TFLTensor`
TFLTensor *outputTensor = [interpreter outputTensorAtIndex:0 error:&error];
if (error != nil) { /* Error handling... */ }

// Copy output to `NSData` to process the inference results.
NSData *outputData = [outputTensor dataWithError:&error];
if (error != nil) { /* Error handling... */ }

Objective-C 程式碼中的 C API

Objective-C API 不支援委派。為了與 Objective-C 程式碼，您需要直接呼叫基礎 C API。

#include "tensorflow/lite/c/c_api.h"

TfLiteModel* model = TfLiteModelCreateFromFile([modelPath UTF8String]);
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();

// Create the interpreter.
TfLiteInterpreter* interpreter = TfLiteInterpreterCreate(model, options);

// Allocate tensors and populate the input tensor data.
TfLiteInterpreterAllocateTensors(interpreter);
TfLiteTensor* input_tensor =
    TfLiteInterpreterGetInputTensor(interpreter, 0);
TfLiteTensorCopyFromBuffer(input_tensor, input.data(),
                           input.size() * sizeof(float));

// Execute inference.
TfLiteInterpreterInvoke(interpreter);

// Extract the output tensor data.
const TfLiteTensor* output_tensor =
    TfLiteInterpreterGetOutputTensor(interpreter, 0);
TfLiteTensorCopyToBuffer(output_tensor, output.data(),
                         output.size() * sizeof(float));

// Dispose of the model and interpreter objects.
TfLiteInterpreterDelete(interpreter);
TfLiteInterpreterOptionsDelete(options);
TfLiteModelDelete(model);

C++

用 LiteRT 執行推論的 C++ API 與 Android、iOS、和 Linux 平台iOS 上的 C++ API 僅適用於使用 bazel 的情況。

在 C++ 中，模型會儲存在 FlatBufferModel 類別。會封裝 LiteRT 模型，且您可以透過幾種不同方式視模型儲存位置而定：

class FlatBufferModel {
  // Build a model based on a file. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromFile(
      const char* filename,
      ErrorReporter* error_reporter);

  // Build a model based on a pre-loaded flatbuffer. The caller retains
  // ownership of the buffer and should keep it alive until the returned object
  // is destroyed. Return a nullptr in case of failure.
  static std::unique_ptr<FlatBufferModel> BuildFromBuffer(
      const char* buffer,
      size_t buffer_size,
      ErrorReporter* error_reporter);
};

您已將模型做為 FlatBufferModel 物件，讓您可以執行該模型以及 Interpreter。單一 FlatBufferModel 可同時由多個 Interpreter。

下列程式碼片段顯示 Interpreter API 的重要部分。請注意：

張量會以整數表示，以免進行字串比較 (以及對字串程式庫的任何固定依附元件)。
不得透過並行執行緒存取解譯器。
您必須呼叫調整張量大小後立即顯示 AllocateTensors()。

使用 LiteRT 搭配 C++ 的簡單用法如下：

// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
    tflite::FlatBufferModel::BuildFromFile(filename);

// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// Resize input tensors, if needed.
interpreter->AllocateTensors();

float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.

interpreter->Invoke();

float* output = interpreter->typed_output_tensor<float>(0);

如需更多範例程式碼，請參閱 minimal.cc 和 label_image.cc。

Python

執行推論的 Python API 會使用 Interpreter：載入模型並執行推論

安裝 LiteRT 套件：

$ python3 -m pip install ai-edge-litert

匯入 LiteRT 解譯器

from ai_edge_litert.interpreter import Interpreter
Interpreter = Interpreter(model_path=args.model.file)

以下範例說明如何使用 Python 解譯器載入 FlatBuffers (.tflite) 檔案，並以隨機輸入資料執行推論：

如果您要從具有定義 SignatureDef 的 SavedModel 轉換，建議使用這個範例。

class TestModel(tf.Module):
  def __init__(self):
    super(TestModel, self).__init__()

  @tf.function(input_signature=[tf.TensorSpec(shape=[1, 10], dtype=tf.float32)])
  def add(self, x):
    '''
    Simple method that accepts single input 'x' and returns 'x' + 4.
    '''
    # Name the output 'result' for convenience.
    return {'result' : x + 4}

SAVED_MODEL_PATH = 'content/saved_models/test_variable'
TFLITE_FILE_PATH = 'content/test_variable.tflite'

# Save the model
module = TestModel()
# You can omit the signatures argument and a default signature name will be
# created with name 'serving_default'.
tf.saved_model.save(
    module, SAVED_MODEL_PATH,
    signatures={'my_signature':module.add.get_concrete_function()})

# Convert the model using TFLiteConverter
converter = tf.lite.TFLiteConverter.from_saved_model(SAVED_MODEL_PATH)
tflite_model = converter.convert()
with open(TFLITE_FILE_PATH, 'wb') as f:
  f.write(tflite_model)

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)

# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()

# my_signature is callable with input as arguments.
output = my_signature(x=tf.constant([1.0], shape=(1,10), dtype=tf.float32))
# 'output' is dictionary with all outputs from the inference.
# In this case we have single output 'result'.
print(output['result'])

如果模型未定義 SignatureDefs，則會出現另一個例子。

import numpy as np
import tensorflow as tf

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

除了將模型載入為預先轉換的 .tflite 檔案之外，您可以可以將您的程式碼與 LiteRT 編譯器，以便將 Keras 模型轉換為 LiteRT 格式，然後執行推論：

import numpy as np
import tensorflow as tf

img = tf.keras.Input(shape=(64, 64, 3), name="img")
const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
val = img + const
out = tf.identity(val, name="out")

# Convert to LiteRT format
converter = tf.lite.TFLiteConverter.from_keras_model(tf.keras.models.Model(inputs=[img], outputs=[out]))
tflite_model = converter.convert()

# Load the LiteRT model and allocate tensors.
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

# Continue to get tensors and so forth, as shown above...

如需更多 Python 程式碼範例，請參閱 label_image.py。

使用動態形狀模型執行推論

如要執行採用動態輸入形狀的模型，請調整輸入形狀的大小再執行推論否則，Tensorflow 模型中的 None 形狀會在 LiteRT 模型中替換為 1 預留位置。

以下範例說明如何在使用不同語言執行推論前，調整輸入形狀的大小。所有範例都假設輸入形狀定義為 [1/None, 10]，且需要調整大小為 [3, 10]。

C++ 範例：

// Resize input tensors before allocate tensors
interpreter->ResizeInputTensor(/*tensor_index=*/0, std::vector<int>{3,10});
interpreter->AllocateTensors();

Python 範例：

# Load the LiteRT model in LiteRT Interpreter
from ai_edge_litert.interpreter import Interpreter
interpreter = Interpreter(model_path=TFLITE_FILE_PATH)

# Resize input shape for dynamic shape model and allocate tensor
interpreter.resize_tensor_input(interpreter.get_input_details()[0]['index'], [3, 10])
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()