🏁Finetuning from Last Checkpoint
Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
You must edit the Trainer
first to add save_strategy
and save_steps
. Below saves a checkpoint every 50 steps to the folder outputs
.
trainer = SFTTrainer(
....
args = TrainingArguments(
....
output_dir = "outputs",
save_strategy = "steps",
save_steps = 50,
),
)
Then in the trainer do:
trainer_stats = trainer.train(resume_from_checkpoint = True)
Which will start from the latest checkpoint and continue training.
Wandb Integration
# Install library
!pip install wandb --upgrade
# Setting up Wandb
!wandb login <token>
import os
os.environ["WANDB_PROJECT"] = "<name>"
os.environ["WANDB_LOG_MODEL"] = "checkpoint"
Then in TrainingArguments()
set
report_to = "wandb",
logging_steps = 1, # Change if needed
save_steps = 100 # Change if needed
run_name = "<name>" # (Optional)
To train the model, do trainer.train()
; to resume training, do
import wandb
run = wandb.init()
artifact = run.use_artifact('<username>/<Wandb-project-name>/<run-id>', type='model')
artifact_dir = artifact.download()
trainer.train(resume_from_checkpoint=artifact_dir)
Last updated
Was this helpful?