๐ŸFinetuning from Last Checkpoint

Checkpointing allows you to save your finetuning progress so you can pause it and then continue.

You must edit the Trainer first to add save_strategy and save_steps. Below saves a checkpoint every 50 steps to the folder outputs.

trainer = SFTTrainer(
    ....
    args = TrainingArguments(
        ....
        output_dir = "outputs",
        save_strategy = "steps",
        save_steps = 50,
    ),
)

Then in the trainer do:

trainer_stats = trainer.train(resume_from_checkpoint = True)

Which will start from the latest checkpoint and continue training.

Wandb Integration

# Install library
!pip install wandb --upgrade

# Setting up Wandb
!wandb login <token>

import os

os.environ["WANDB_PROJECT"] = "<name>"
os.environ["WANDB_LOG_MODEL"] = "checkpoint"

Then in TrainingArguments() set

To train the model, do trainer.train(); to resume training, do

โ“How do I do Early Stopping?

If you want to stop or pause the finetuning / training run since the evaluation loss is not decreasing, then you can use early stopping which stops the training process. Use EarlyStoppingCallback.

As usual, set up your trainer and your evaluation dataset. The below is used to stop the training run if the eval_loss (the evaluation loss) is not decreasing after 3 steps or so.

We then add the callback which can also be customized:

Then train the model as usual via trainer.train() .

Last updated

Was this helpful?