T5 Finetuning Tips

moscow25 · August 12, 2020, 4:56pm

Sure thing @valhalla. I did not try too many settings… but LR 0.001 seems to work just fine for smaller finetuning batches. I’m running global batch of 2*8 [2 per GPU] – though with a bit of gradient accumulation (4x I believe) but tbh it’s not really that sensitive as far as I can tell. The only gotcha is to turn off those extra scaling parameters that FAIR-seq threw in there and set True by default for no good reason. (scale_parameter=False, relative_step=False)

To get bigger batches, I’m pretty sure we need to add some gradient checkpointing to the model. Trying that out next…

Topic		Replies	Views
Finetuning T5 for a task Intermediate	21	4860	September 3, 2022
Finetuning T5 on translation task 🤗Transformers	0	408	September 10, 2021
Does task specific prefix matters for T5 fine-tuning? Beginners	9	6359	June 28, 2021
T5-small parameter finetuning translation task Models	0	486	June 29, 2022
Finetune T5 with T5ForConditionalGeneration to multitask for Q&A and Summarization 🤗Transformers	0	314	November 28, 2023

T5 Finetuning Tips

Related Topics