Training Scripts

by khatrimann - opened Jan 11

Jan 11

As chat template is not there (obviously) so do we need to do any sort of preprocessing before giving the inputs or I can simply give

input_prompt = """Summarize the following text:

{Text}"""

target = summary

And then pass to processor's input and target as we do it for models like T5?

Or do I need to process it more? Or are there any caveats?

pannaga10

Google org Jan 13

Hi @khatrimann
You are spot on your intuition to treat this like a T5 model is correct. Since this is an Encoder-Decoder architecture , you do not need the complex chat templates . Your proposed input format is perfectly fine.
Since t5 gemma 2 is multimodal , if you are doing text only tasks avoid AutoProcessor and use AutoTokenizer . You can also add more content as T5Gemma 2 supports context windows of up to 128K tokens.
Thanks

daruokta

6 days ago

Hi @khatrimann
You are spot on your intuition to treat this like a T5 model is correct. Since this is an Encoder-Decoder architecture , you do not need the complex chat templates . Your proposed input format is perfectly fine.
Since t5 gemma 2 is multimodal , if you are doing text only tasks avoid AutoProcessor and use AutoTokenizer . You can also add more content as T5Gemma 2 supports context windows of up to 128K tokens.
Thanks

Hi @pannaga10 , thank you so much for the clarification! That completely makes sense regarding the Encoder-Decoder architecture and using AutoTokenizer instead of AutoProcessor for text-only tasks to avoid the multimodal overhead and regex issues.

I have a follow-up question regarding Instruction Tuning for this second generation. I noticed that for the first generation of T5 Gemma, Google released the official instruct-tuned versions (e.g., google/t5gemma-2b-2b-prefixlm-it and google/t5gemma-2b-2b-ul2-it).

Are there any plans to release an official -it or chat-tuned version for the t5gemma-2 series anytime soon?

In the meantime, if we want to fine-tune this t5gemma-2-270m-270m base model into an instruction/chat model ourselves using Hugging Face's Seq2SeqTrainer or standard Trainer, could you provide a high-level script or code snippet on the best practices to format the inputs and labels?

Specifically, since there is no standard chat_template, I'm curious about the optimal way to structure multi-turn conversations (User and Assistant roles) as input_ids and labels. Should we just concatenate them with distinct text prefixes before tokenization, like:

input_text = "User: Hello!\nAssistant: Hi there!\nUser: Write a poem about AI.\nAssistant:"
target_text = "[Poem content here...]"

Any complete script outline or guidance on how Google approaches the Instruct Tuning (dataset formatting + hyperparams) for this specific T5Gemma-2 architecture would be incredibly helpful for the community!

Thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment