You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if rank == 0:
t = Thread(
target=save_file,
args=(model_sd, f"{cfg.model_dir}/model_{step + 1}.safetensors"),
daemon=True
)
t.start()
Which saves the checkpoint to disk using safetensors. However, I notice that this blocks the training loop, even though the thread should be running in the background.
When I switch the code to use torch.save, there's no issue. What should I do?
The text was updated successfully, but these errors were encountered:
I have the following code in my training loop:
Which saves the checkpoint to disk using safetensors. However, I notice that this blocks the training loop, even though the thread should be running in the background.
When I switch the code to use
torch.save
, there's no issue. What should I do?The text was updated successfully, but these errors were encountered: