LiteRT-LM CLI

The Command Line Interface (CLI) lets you test models immediately—no code required.

Supported Platforms:

  • Linux
  • macOS
  • Windows (via WSL)
  • Raspberry Pi

Installation

Installs litert-lm as a system-wide binary. Requires uv.

uv tool install litert-lm-nightly

Method 2: pip

Standard installation within a virtual environment.

python3 -m venv .venv
source .venv/bin/activate
pip install litert-lm-nightly

Chat

Download from HuggingFace and run the model:

litert-lm run  \
  --from-huggingface-repo=google/gemma-3n-E2B-it-litert-lm \
  gemma-3n-E2B-it-int4 \
  --prompt="What is the capital of France?"

Function Calling / Tools

You can run tools with presets. Create a preset.py:

import datetime
import base64

def get_current_time() -> str:
    """Returns the current date and time."""
    return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

system_instruction = "You are a helpful assistant with access to tools."
tools = [get_current_time]

Run with preset:

litert-lm run  \
  --from-huggingface-repo=google/gemma-3n-E2B-it-litert-lm \
  gemma-3n-E2B-it-int4 \
  --preset=preset.py

Sample prompts and interactive output:

> what will the time be in two hours?
[tool_call] {"arguments": {}, "name": "get_current_time"}
[tool_response] {"name": "get_current_time", "response": "2026-03-25 21:54:07"}
The current time is 2026-03-25 21:54:07.

In two hours, it will be **2026-03-25 23:54:07**.

What is Happening Here?

When you ask a question that requires external information (like the current time), the model recognizes that it needs to call a tool.

  1. Model Emits tool_call: The model outputs a JSON request to call the get_current_time function.
  2. CLI Executes Tool: The LiteRT-LM CLI intercepts this call and executes the corresponding Python function defined in your preset.py.
  3. CLI Sends tool_response: The CLI sends the result back to the model.
  4. Model Generates Final Answer: The model use the tool response to compute and generate the final answer for the user.

This "Function Calling" loop happens automatically within the CLI, allowing you to augment local LLMs with Python capabilities without writing any complex orchestration code.

The same capabilities are available from the Python, C++, and Kotlin APIs.