stable-diffusion-finetune/scripts/slurm
2022-07-23 09:42:18 +00:00
..
resume_512 old uncommitted stuff 2022-07-23 09:42:18 +00:00
resume_512_improvedaesthetic old uncommitted stuff 2022-07-23 09:42:18 +00:00
resume_768_hr old uncommitted stuff 2022-07-23 09:42:18 +00:00
v1_iahr_torch111 resume v1, disable requeue 2022-07-23 07:54:48 +00:00
v1_improvedaesthetics final v1 restart 2022-07-18 23:47:36 +00:00
v1_improvedaesthetics_torch111 test gpu scripts and other launchers and configs 2022-07-22 09:56:22 +00:00
v1_laionhr_torch111 test gpu scripts and other launchers and configs 2022-07-22 09:56:22 +00:00
v2_laionhr1024 v2 on laionhr 1024 2022-07-14 23:36:08 +00:00
v3_pretraining v3 resume 2022-07-22 21:07:29 +00:00
README.md ready to slurm 2022-07-06 22:52:52 +00:00

Example

Resume f8 @ 512 on Laion-HR

sbatch scripts/slurm/resume_512/sbatch.sh

Reuse

To reuse this as a template, copy sbatch.sh and launcher.sh somewhere. In sbatch.sh, adjust the lines

#SBATCH --job-name=stable-diffusion-512cont
#SBATCH --nodes=24

and the path to your launcher.sh in the last line,

srun bash /fsx/stable-diffusion/stable-diffusion/scripts/slurm/resume_512/launcher.sh

In launcher.sh, adjust CONFIG and EXTRA. Maybe give it a test run with debug flags uncommented and a reduced number of nodes.