এই পৃষ্ঠাটি Cloud Translation API অনুবাদ করেছে।

বুঝুন এবং টোকেন গণনা করুন

জেমিনি এবং অন্যান্য জেনারেটিভ এআই মডেলগুলি টোকেন নামক একটি গ্রানুলারিটিতে ইনপুট এবং আউটপুট প্রক্রিয়া করে।

জেমিনি মডেলের জন্য, একটি টোকেন প্রায় ৪টি অক্ষরের সমান। ১০০টি টোকেন প্রায় ৬০-৮০টি ইংরেজি শব্দের সমান।

টোকেন সম্পর্কে

টোকেনগুলি z মতো একক অক্ষর অথবা cat মতো সম্পূর্ণ শব্দ হতে পারে। দীর্ঘ শব্দগুলিকে কয়েকটি টোকেনে বিভক্ত করা হয়। মডেল দ্বারা ব্যবহৃত সমস্ত টোকেনের সেটকে শব্দভাণ্ডার বলা হয় এবং পাঠ্যকে টোকেনে বিভক্ত করার প্রক্রিয়াটিকে টোকেনাইজেশন বলা হয়।

যখন বিলিং সক্ষম করা থাকে, তখন জেমিনি এপিআইতে কল করার খরচ আংশিকভাবে ইনপুট এবং আউটপুট টোকেনের সংখ্যা দ্বারা নির্ধারিত হয়, তাই টোকেন কীভাবে গণনা করতে হয় তা জানা সহায়ক হতে পারে।

Colab-এ টোকেন গণনা করে দেখুন

আপনি একটি Colab ব্যবহার করে টোকেন গণনা করার চেষ্টা করতে পারেন।

ai.google.dev-এ দেখুন

একটি Colab নোটবুক ব্যবহার করে দেখুন

GitHub-এ নোটবুক দেখুন

প্রসঙ্গ উইন্ডো

জেমিনি এপিআই-এর মাধ্যমে উপলব্ধ মডেলগুলিতে কনটেক্সট উইন্ডো থাকে যা টোকেনে পরিমাপ করা হয়। কনটেক্সট উইন্ডোটি নির্ধারণ করে যে আপনি কতটা ইনপুট প্রদান করতে পারবেন এবং মডেলটি কতটা আউটপুট তৈরি করতে পারবে। আপনি getModels এন্ডপয়েন্টে কল করে অথবা models ডকুমেন্টেশন দেখে কনটেক্সট উইন্ডোর আকার নির্ধারণ করতে পারেন।

নিম্নলিখিত উদাহরণে, আপনি দেখতে পাচ্ছেন যে gemini-2.0-flash মডেলের ইনপুট সীমা প্রায় ১০,০০,০০০ টোকেন এবং আউটপুট সীমা প্রায় ৮,০০০ টোকেন, যার অর্থ একটি প্রসঙ্গ উইন্ডো হল ১০,০০,০০০ টোকেন।

from google import genai

client = genai.Client()
model_info = client.models.get(model="gemini-2.0-flash")
print(f"{model_info.input_token_limit=}")
print(f"{model_info.output_token_limit=}")
# ( e.g., input_token_limit=30720, output_token_limit=2048 )count_tokens.py

টোকেন গণনা করুন

জেমিনি এপিআই থেকে সমস্ত ইনপুট এবং আউটপুট টোকেনাইজড, যার মধ্যে টেক্সট, ইমেজ ফাইল এবং অন্যান্য নন-টেক্সট মোডালিটি অন্তর্ভুক্ত।

আপনি নিম্নলিখিত উপায়ে টোকেন গণনা করতে পারেন:

টেক্সট টোকেন গণনা করুন

from google import genai

client = genai.Client()
prompt = "The quick brown fox jumps over the lazy dog."

# Count tokens using the new client method.
total_tokens = client.models.count_tokens(
    model="gemini-2.0-flash", contents=prompt
)
print("total_tokens: ", total_tokens)
# ( e.g., total_tokens: 10 )

response = client.models.generate_content(
    model="gemini-2.0-flash", contents=prompt
)

# The usage_metadata provides detailed token counts.
print(response.usage_metadata)
# ( e.g., prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )count_tokens.py

Count multi-turn (chat) tokens

from google import genai
from google.genai import types

client = genai.Client()

chat = client.chats.create(
    model="gemini-2.0-flash",
    history=[
        types.Content(
            role="user", parts=[types.Part(text="Hi my name is Bob")]
        ),
        types.Content(role="model", parts=[types.Part(text="Hi Bob!")]),
    ],
)
# Count tokens for the chat history.
print(
    client.models.count_tokens(
        model="gemini-2.0-flash", contents=chat.get_history()
    )
)
# ( e.g., total_tokens: 10 )

response = chat.send_message(
    message="In one sentence, explain how a computer works to a young child."
)
print(response.usage_metadata)
# ( e.g., prompt_token_count: 25, candidates_token_count: 21, total_token_count: 46 )

# You can count tokens for the combined history and a new message.
extra = types.UserContent(
    parts=[
        types.Part(
            text="What is the meaning of life?",
        )
    ]
)
history = chat.get_history()
history.append(extra)
print(client.models.count_tokens(model="gemini-2.0-flash", contents=history))
# ( e.g., total_tokens: 56 )count_tokens.py

মাল্টিমোডাল টোকেন গণনা করুন

All input to the Gemini API is tokenized, including text, image files, and other non-text modalities. Note the following high-level key points about tokenization of multimodal input during processing by the Gemini API:

With Gemini 2.0, image inputs with both dimensions <=384 pixels are counted as 258 tokens. Images larger in one or both dimensions are cropped and scaled as needed into tiles of 768x768 pixels, each counted as 258 tokens. Prior to Gemini 2.0, images used a fixed 258 tokens.
Video and audio files are converted to tokens at the following fixed rates: video at 263 tokens per second and audio at 32 tokens per second.

Media resolutions

Gemini 3 Pro Preview introduces granular control over multimodal vision processing with the media_resolution parameter. The media_resolution parameter determines the maximum number of tokens allocated per input image or video frame. Higher resolutions improve the model's ability to read fine text or identify small details, but increase token usage and latency.

For more details about the parameter and how it can impact token calculations, see the media resolution guide.

ছবির ফাইল

Example that uses an uploaded image from the File API:

from google import genai

client = genai.Client()
prompt = "Tell me about this image"
your_image_file = client.files.upload(file=media / "organ.jpg")

print(
    client.models.count_tokens(
        model="gemini-2.0-flash", contents=[prompt, your_image_file]
    )
)
# ( e.g., total_tokens: 263 )

response = client.models.generate_content(
    model="gemini-2.0-flash", contents=[prompt, your_image_file]
)
print(response.usage_metadata)
# ( e.g., prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

Example that provides the image as inline data:

from google import genai
import PIL.Image

client = genai.Client()
prompt = "Tell me about this image"
your_image_file = PIL.Image.open(media / "organ.jpg")

# Count tokens for combined text and inline image.
print(
    client.models.count_tokens(
        model="gemini-2.0-flash", contents=[prompt, your_image_file]
    )
)
# ( e.g., total_tokens: 263 )

response = client.models.generate_content(
    model="gemini-2.0-flash", contents=[prompt, your_image_file]
)
print(response.usage_metadata)
# ( e.g., prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

Video or audio files

Audio and video are each converted to tokens at the following fixed rates:

Video: 263 tokens per second
Audio: 32 tokens per second

from google import genai
import time

client = genai.Client()
prompt = "Tell me about this video"
your_file = client.files.upload(file=media / "Big_Buck_Bunny.mp4")

# Poll until the video file is completely processed (state becomes ACTIVE).
while not your_file.state or your_file.state.name != "ACTIVE":
    print("Processing video...")
    print("File state:", your_file.state)
    time.sleep(5)
    your_file = client.files.get(name=your_file.name)

print(
    client.models.count_tokens(
        model="gemini-2.0-flash", contents=[prompt, your_file]
    )
)
# ( e.g., total_tokens: 300 )

response = client.models.generate_content(
    model="gemini-2.0-flash", contents=[prompt, your_file]
)
print(response.usage_metadata)
# ( e.g., prompt_token_count: 301, candidates_token_count: 60, total_token_count: 361 )count_tokens.py

System instructions and tools

System instructions and tools also count towards the total token count for the input.

If you use system instructions, the total_tokens count increases to reflect the addition of system_instruction .

If you use function calling, the total_tokens count increases to reflect the addition of tools .