Training Scripts
As chat template is not there (obviously) so do we need to do any sort of preprocessing before giving the inputs or I can simply give
input_prompt = """Summarize the following text:
{Text}"""
target = summary
And then pass to processor's input and target as we do it for models like T5?
Or do I need to process it more? Or are there any caveats?
Hi @khatrimann
You are spot on your intuition to treat this like a T5 model is correct. Since this is an Encoder-Decoder architecture , you do not need the complex chat templates . Your proposed input format is perfectly fine.
Since t5 gemma 2 is multimodal , if you are doing text only tasks avoid AutoProcessor and use AutoTokenizer . You can also add more content as T5Gemma 2 supports context windows of up to 128K tokens.
Thanks
Hi @khatrimann
You are spot on your intuition to treat this like a T5 model is correct. Since this is an Encoder-Decoder architecture , you do not need the complex chat templates . Your proposed input format is perfectly fine.
Since t5 gemma 2 is multimodal , if you are doing text only tasks avoid AutoProcessor and use AutoTokenizer . You can also add more content as T5Gemma 2 supports context windows of up to 128K tokens.
Thanks
Hi @pannaga10 , thank you so much for the clarification! That completely makes sense regarding the Encoder-Decoder architecture and using AutoTokenizer instead of AutoProcessor for text-only tasks to avoid the multimodal overhead and regex issues.
I have a follow-up question regarding Instruction Tuning for this second generation. I noticed that for the first generation of T5 Gemma, Google released the official instruct-tuned versions (e.g., google/t5gemma-2b-2b-prefixlm-it and google/t5gemma-2b-2b-ul2-it).
Are there any plans to release an official -it or chat-tuned version for the t5gemma-2 series anytime soon?
In the meantime, if we want to fine-tune this t5gemma-2-270m-270m base model into an instruction/chat model ourselves using Hugging Face's Seq2SeqTrainer or standard Trainer, could you provide a high-level script or code snippet on the best practices to format the inputs and labels?
Specifically, since there is no standard chat_template, I'm curious about the optimal way to structure multi-turn conversations (User and Assistant roles) as input_ids and labels. Should we just concatenate them with distinct text prefixes before tokenization, like:
input_text = "User: Hello!\nAssistant: Hi there!\nUser: Write a poem about AI.\nAssistant:"
target_text = "[Poem content here...]"
Any complete script outline or guidance on how Google approaches the Instruct Tuning (dataset formatting + hyperparams) for this specific T5Gemma-2 architecture would be incredibly helpful for the community!
Thank you!