๐Finetuning from Last Checkpoint
Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
You must edit the Trainer first to add save_strategy and save_steps. Below saves a checkpoint every 50 steps to the folder outputs.
trainer = SFTTrainer(
....
args = TrainingArguments(
....
output_dir = "outputs",
save_strategy = "steps",
save_steps = 50,
),
)Then in the trainer do:
trainer_stats = trainer.train(resume_from_checkpoint = True)Which will start from the latest checkpoint and continue training.
Wandb Integration
# Install library
!pip install wandb --upgrade
# Setting up Wandb
!wandb login <token>
import os
os.environ["WANDB_PROJECT"] = "<name>"
os.environ["WANDB_LOG_MODEL"] = "checkpoint"Then in TrainingArguments() set
To train the model, do trainer.train(); to resume training, do
โHow do I do Early Stopping?
If you want to stop or pause the finetuning / training run since the evaluation loss is not decreasing, then you can use early stopping which stops the training process. Use EarlyStoppingCallback.
As usual, set up your trainer and your evaluation dataset. The below is used to stop the training run if the eval_loss (the evaluation loss) is not decreasing after 3 steps or so.
We then add the callback which can also be customized:
Then train the model as usual via trainer.train() .
Last updated
Was this helpful?

