# Training configurations This document provides guidelines for selecting appropriate training options for various scenarios, as well as an extensive list of recommended configurations. #### Example In the remainder of this document, we summarize each configuration as follows: | _Config

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :--------------: | :--------------: | :------------: | :-- | _{StyleGAN3‑T} | _18.47 | _12.29 | _4.3 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=8.2 --mirror=1`} This corresponds to the following command line: ```.bash # Train StyleGAN3-T for AFHQv2 using 8 GPUs. python train.py --outdir=~/training-runs --cfg=stylegan3-t --data=~/datasets/afhqv2-512x512.zip \ --gpus=8 --batch=32 --gamma=8.2 --mirror=1 ``` Explanation of the columns: - **Config**: StyleGAN3-T (translation equiv.), StyleGAN3-R (translation and rotation equiv.), or StyleGAN2. Reflects the value of `--cfg`. - **s/kimg**: Raw training speed, measured separately on Tesla V100 and A100 using our recommended Docker image. The number indicates how many seconds, on average, it takes to process 1000 images from the training set. The number tends to vary slightly over the course of training; typically by no more than ±20%. - **GPU mem**: Maximum GPU memory usage observed during training, reported in gigabytes per GPU. The above example uses 8 GPUs, which means that the total GPU memory usage is around 34.4 GB. - **Options**: Command line options for `train.py`, excluding `--outdir` and `--data`. #### Total training time In addition the raw s/kimg number, the training time also depends on the `--kimg` and `--metric` options. `--kimg` controls the total number of training iterations and is set to 25000 by default. This is long enough to reach convergence in typical cases, but in practice the results should already look quite reasonable around 5000 kimg. `--metrics` determines which quality metrics are computed periodically during training. The default is `fid50k_full`, which increases the training time slightly; typically by no more than 5%. The automatic computation can be disabled by specifying `--metrics=none`. In the above example, the total training time on V100 is approximately 18.47 s/kimg * 25000 kimg * 1.05 ≈ 485,000 seconds ≈ 5 days and 14 hours. Disabling metric computation (`--metrics=none`) reduces this to approximately 5 days and 8 hours. ## General guidelines The most important hyperparameter that needs to be tuned on a per-dataset basis is the R₁ regularization weight, `--gamma`, that must be specified explicitly for `train.py`. As a rule of thumb, the value of `--gamma` scales quadratically with respect to the training set resolution: doubling the resolution (e.g., 256x256 → 512x512) means that `--gamma` should be multiplied by 4 (e.g., 2 → 8). The optimal value is usually the same for `--cfg=stylegan3-t` and `--cfg=stylegan3-r`, but considerably lower for `--cfg=stylegan2`. In practice, we recommend selecting the value of `--gamma` as follows: - Find the closest match for your specific case in this document (config, resolution, and GPU count). - Try training with the same `--gamma` first. - Then, try increasing the value by 2x and 4x, and also decreasing it by 2x and 4x. - Pick the value that yields the lowest FID. The results may also be improved by adjusting `--mirror` and `--aug`, depending on the training data. Specifying `--mirror=1` augments the dataset with random *x*-flips, which effectively doubles the number of images. This is generally beneficial with datasets that are horizontally symmetric (e.g., FFHQ), but it can be harmful if the images contain noticeable asymmetric features (e.g., text or letters). Specifying `--aug=noaug` disables adaptive discriminator augmentation (ADA), which may improve the results slightly if the training set is large enough (at least 100k images when accounting for *x*-flips). With small datasets (less than 30k images), it is generally a good idea to leave the augmentations enabled. It is possible to speed up the training by decreasing network capacity, i.e., `--cbase=16384`. This typically leads to lower quality results, but the difference is less pronounced with low-resolution datasets (e.g., 256x256). #### Scaling to different number of GPUs You can select the number of GPUs by changing the value of `--gpu`; this does not affect the convergence curves or training dynamics in any way. By default, the total batch size (`--batch`) is divided evenly among the GPUs, which means that decreasing the number of GPUs yields higher per-GPU memory usage. To avoid running out of memory, you can decrease the per-GPU batch size by specifying `--batch-gpu`, which performs the same computation in multiple passes using gradient accumulation. By default, `train.py` exports network snapshots once every 200 kimg, i.e., the product of `--snap=50` and `--tick=4`. When using few GPUs (e.g., 1–2), this means that it may take a very long time for the first snapshot to appear. We recommend increasing the snapshot frequency in such cases by specifying `--snap=20`, `--snap=10`, or `--snap=5`. Note that the configurations listed in this document have been specifically tuned for 8 GPUs. The safest way to scale them to different GPU counts is to adjust `--gpu`, `--batch-gpu`, and `--snap` as described above, but it may be possible to reach faster convergence by adjusting some of the other hyperparameters as well. Note, however, that adjusting the total batch size (`--batch`) requires some experimentation; decreasing `--batch` usually necessitates increasing regularization (`--gamma`) and/or decreasing the learning rates (most importantly `--dlr`). #### Transfer learning Transfer learning makes it possible to reach very good results very quickly, especially when the training set is small and/or the images resemble the ones produced by a pre-trained model. To enable transfer learning, you can point `--resume` to one of the pre-trained models that we provide for [StyleGAN3](https://ngc.nvidia.com/catalog/models/nvidia:research:stylegan3) and [StyleGAN2](https://ngc.nvidia.com/catalog/models/nvidia:research:stylegan2). For example: ```.bash # Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle. python train.py --outdir=~/training-runs --cfg=stylegan3-r --data=~/datasets/metfacesu-1024x1024.zip \ --gpus=8 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=5 \ --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-1024x1024.pkl ``` The pre-trained model should be selected to match the specified config, resolution, and architecture-related hyperparameters (e.g., `--cbase`, `--map-depth`, and `--mbstd-group`). You check this by looking at the `fakes_init.png` exported by `train.py` at the beginning; if the configuration is correct, the images should look reasonable. With transfer learning, the results may be improved slightly by adjusting `--freezed`, in addition to the above guidelines for `--gamma`, `--mirror`, and `--aug`. In our experience, `--freezed=10` and `--freezed=13` tend to work reasonably well. ## Recommended configurations This section lists recommended settings for StyleGAN3-T and StyleGAN3-R for different resolutions and GPU counts, selected according to the above guidelines. These are intended to provide a good starting point when experimenting with a new dataset. Please note that many of the options (e.g., `--gamma`, `--mirror`, and `--aug`) are still worth adjusting on a case-by-case basis. #### 128x128 resolution | _Config

| _GPUs

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :----------: | :--------------: | :--------------: | :------------: | :-- | _{StyleGAN3‑T} | ₁ | _73.68 | _27.20 | _7.2 | _{`--cfg=stylegan3-t --gpus=1 --batch=32 --gamma=0.5 --batch-gpu=16 --snap=10`} | _{StyleGAN3‑T} | ₂ | _37.30 | _13.74 | _7.1 | _{`--cfg=stylegan3-t --gpus=2 --batch=32 --gamma=0.5 --snap=20`} | _{StyleGAN3‑T} | ₄ | _20.66 | _7.52 | _4.1 | _{`--cfg=stylegan3-t --gpus=4 --batch=32 --gamma=0.5`} | _{StyleGAN3‑T} | ₈ | _11.31 | _4.40 | _2.6 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=0.5`} | _{StyleGAN3‑R} | ₁ | _58.44 | _34.23 | _8.3 | _{`--cfg=stylegan3-r --gpus=1 --batch=32 --gamma=0.5 --batch-gpu=16 --snap=10`} | _{StyleGAN3‑R} | ₂ | _29.92 | _17.29 | _8.2 | _{`--cfg=stylegan3-r --gpus=2 --batch=32 --gamma=0.5 --snap=20`} | _{StyleGAN3‑R} | ₄ | _15.49 | _9.53 | _4.5 | _{`--cfg=stylegan3-r --gpus=4 --batch=32 --gamma=0.5`} | _{StyleGAN3‑R} | ₈ | _8.43 | _5.69 | _2.7 | _{`--cfg=stylegan3-r --gpus=8 --batch=32 --gamma=0.5`} #### 256x256 resolution | _Config

| _GPUs

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :----------: | :--------------: | :--------------: | :------------: | :-- | _{StyleGAN3‑T} | ₁ | _89.15 | _49.81 | _9.5 | _{`--cfg=stylegan3-t --gpus=1 --batch=32 --gamma=2 --batch-gpu=16 --snap=10`} | _{StyleGAN3‑T} | ₂ | _45.45 | _25.05 | _9.3 | _{`--cfg=stylegan3-t --gpus=2 --batch=32 --gamma=2 --snap=20`} | _{StyleGAN3‑T} | ₄ | _23.94 | _13.26 | _5.2 | _{`--cfg=stylegan3-t --gpus=4 --batch=32 --gamma=2`} | _{StyleGAN3‑T} | ₈ | _13.04 | _7.32 | _3.1 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=2`} | _{StyleGAN3‑R} | ₁ | _87.37 | _56.73 | _6.7 | _{`--cfg=stylegan3-r --gpus=1 --batch=32 --gamma=2 --batch-gpu=8 --snap=10`} | _{StyleGAN3‑R} | ₂ | _44.12 | _28.60 | _6.7 | _{`--cfg=stylegan3-r --gpus=2 --batch=32 --gamma=2 --batch-gpu=8 --snap=20`} | _{StyleGAN3‑R} | ₄ | _22.42 | _14.39 | _6.6 | _{`--cfg=stylegan3-r --gpus=4 --batch=32 --gamma=2`} | _{StyleGAN3‑R} | ₈ | _11.88 | _8.03 | _3.7 | _{`--cfg=stylegan3-r --gpus=8 --batch=32 --gamma=2`} #### 512x512 resolution | _Config

| _GPUs

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :----------: | :---------------: | :---------------: | :------------: | :-- | _{StyleGAN3‑T} | ₁ | _137.33 | _90.25 | _7.8 | _{`--cfg=stylegan3-t --gpus=1 --batch=32 --gamma=8 --batch-gpu=8 --snap=10`} | _{StyleGAN3‑T} | ₂ | _69.65 | _45.42 | _7.7 | _{`--cfg=stylegan3-t --gpus=2 --batch=32 --gamma=8 --batch-gpu=8 --snap=20`} | _{StyleGAN3‑T} | ₄ | _34.88 | _22.81 | _7.6 | _{`--cfg=stylegan3-t --gpus=4 --batch=32 --gamma=8`} | _{StyleGAN3‑T} | ₈ | _18.47 | _12.29 | _4.3 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=8`} | _{StyleGAN3‑R} | ₁ | _158.91 | _110.13 | _6.0 | _{`--cfg=stylegan3-r --gpus=1 --batch=32 --gamma=8 --batch-gpu=4 --snap=10`} | _{StyleGAN3‑R} | ₂ | _79.96 | _55.18 | _6.0 | _{`--cfg=stylegan3-r --gpus=2 --batch=32 --gamma=8 --batch-gpu=4 --snap=20`} | _{StyleGAN3‑R} | ₄ | _40.86 | _27.99 | _5.9 | _{`--cfg=stylegan3-r --gpus=4 --batch=32 --gamma=8 --batch-gpu=4`} | _{StyleGAN3‑R} | ₈ | _20.44 | _14.04 | _5.9 | _{`--cfg=stylegan3-r --gpus=8 --batch=32 --gamma=8`} #### 1024x1024 resolution | _Config

| _GPUs

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :----------: | :---------------: | :---------------: | :-------------: | :-- | _{StyleGAN3‑T} | ₁ | _221.85 | _156.91 | _7.0 | _{`--cfg=stylegan3-t --gpus=1 --batch=32 --gamma=32 --batch-gpu=4 --snap=5`} | _{StyleGAN3‑T} | ₂ | _113.44 | _79.16 | _6.8 | _{`--cfg=stylegan3-t --gpus=2 --batch=32 --gamma=32 --batch-gpu=4 --snap=10`} | _{StyleGAN3‑T} | ₄ | _57.04 | _39.62 | _6.7 | _{`--cfg=stylegan3-t --gpus=4 --batch=32 --gamma=32 --batch-gpu=4 --snap=20`} | _{StyleGAN3‑T} | ₈ | _28.71 | _20.01 | _6.6 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=32`} | _{StyleGAN3‑R} | ₁ | _263.44 | _184.81 | _10.2 | _{`--cfg=stylegan3-r --gpus=1 --batch=32 --gamma=32 --batch-gpu=4 --snap=5`} | _{StyleGAN3‑R} | ₂ | _134.22 | _92.58 | _10.1 | _{`--cfg=stylegan3-r --gpus=2 --batch=32 --gamma=32 --batch-gpu=4 --snap=10`} | _{StyleGAN3‑R} | ₄ | _67.33 | _46.53 | _10.0 | _{`--cfg=stylegan3-r --gpus=4 --batch=32 --gamma=32 --batch-gpu=4 --snap=20`} | _{StyleGAN3‑R} | ₈ | _34.12 | _23.42 | _9.9 | _{`--cfg=stylegan3-r --gpus=8 --batch=32 --gamma=32`} ## Configurations used in StyleGAN3 paper This section lists the exact settings that we used in the "Alias-Free Generative Adversarial Networks" paper. #### FFHQ-U and FFHQ at 1024x1024 resolution | _Config

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :--------------: | :--------------: | :------------: | :-- | _StyleGAN2 | _17.55 | _14.57 | _6.2 | _{`--cfg=stylegan2 --gpus=8 --batch=32 --gamma=10 --mirror=1 --aug=noaug`} | _{StyleGAN3‑T} | _28.71 | _20.01 | _6.6 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=32.8 --mirror=1 --aug=noaug`} | _{StyleGAN3‑R} | _34.12 | _23.42 | _9.9 | _{`--cfg=stylegan3-r --gpus=8 --batch=32 --gamma=32.8 --mirror=1 --aug=noaug`} #### MetFaces-U at 1024x1024 resolution | _Config

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :--------------: | :--------------: | :-------------: | :-- | _StyleGAN2 | _18.74 | _11.80 | _7.4 | _{`--cfg=stylegan2 --gpus=8 --batch=32 --gamma=10 --mirror=1 --kimg=5000 --snap=10 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhqu-1024x1024.pkl`} | _{StyleGAN3‑T} | _29.84 | _21.06 | _7.7 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=16.4 --mirror=1 --kimg=5000 --snap=10 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-ffhqu-1024x1024.pkl`} | _{StyleGAN3‑R} | _35.10 | _24.32 | _10.9 | _{`--cfg=stylegan3-r --gpus=8 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=10 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-1024x1024.pkl`} #### MetFaces at 1024x1024 resolution | _Config

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :--------------: | :--------------: | :-------------: | :-- | _StyleGAN2 | _18.74 | _11.80 | _7.4 | _{`--cfg=stylegan2 --gpus=8 --batch=32 --gamma=5 --mirror=1 --kimg=5000 --snap=10 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhq-1024x1024.pkl`} | _{StyleGAN3‑T} | _29.84 | _21.06 | _7.7 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=6.6 --mirror=1 --kimg=5000 --snap=10 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-ffhq-1024x1024.pkl`} | _{StyleGAN3‑R} | _35.10 | _24.32 | _10.9 | _{`--cfg=stylegan3-r --gpus=8 --batch=32 --gamma=3.3 --mirror=1 --kimg=5000 --snap=10 --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhq-1024x1024.pkl`} #### AFHQv2 at 512x512 resolution | _Config

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :--------------: | :--------------: | :------------: | :-- | _StyleGAN2 | _10.90 | _6.60 | _3.9 | _{`--cfg=stylegan2 --gpus=8 --batch=32 --gamma=5 --mirror=1`} | _{StyleGAN3‑T} | _18.47 | _12.29 | _4.3 | _{`--cfg=stylegan3-t --gpus=8 --batch=32 --gamma=8.2 --mirror=1`} | _{StyleGAN3‑R} | _20.44 | _14.04 | _5.9 | _{`--cfg=stylegan3-r --gpus=8 --batch=32 --gamma=16.4 --mirror=1`} #### FFHQ-U ablations at 256x256 resolution | _Config

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :--------------------------- | :-------------: | :-------------: | :------------: | :-- | _StyleGAN2 | _3.61 | _2.19 | _2.7 | _{`--cfg=stylegan2 --gpus=8 --batch=64 --gamma=1 --mirror=1 --aug=noaug --cbase=16384 --glr=0.0025 --dlr=0.0025 --mbstd-group=8`} | _{StyleGAN3‑T} | _7.40 | _3.74 | _3.5 | _{`--cfg=stylegan3-t --gpus=8 --batch=64 --gamma=1 --mirror=1 --aug=noaug --cbase=16384 --dlr=0.0025`} | _{StyleGAN3‑R} | _6.71 | _4.81 | _4.2 | _{`--cfg=stylegan3-r --gpus=8 --batch=64 --gamma=1 --mirror=1 --aug=noaug --cbase=16384 --dlr=0.0025`} ## Old StyleGAN2-ADA configurations This section lists command lines that can be used to match the configurations provided by our previous [StyleGAN2-ADA](https://github.com/NVlabs/stylegan2-ada-pytorch) codebase. The first table corresponds to `--cfg=auto` (default) for different resolutions and GPU counts, while the second table lists the remaining alternatives. #### Default configuration | _Res.

| _GPUs

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :---------------------- | :----------: | :---------------: | :--------------: | :------------: | :-- | _128² | ₁ | _12.51 | _6.79 | _6.2 | _{`--cfg=stylegan2 --gpus=1 --batch=32 --gamma=0.1024 --map-depth=2 --glr=0.0025 --dlr=0.0025 --cbase=16384`} | _128² | ₂ | _6.43 | _3.45 | _6.2 | _{`--cfg=stylegan2 --gpus=2 --batch=64 --gamma=0.0512 --map-depth=2 --glr=0.0025 --dlr=0.0025 --cbase=16384`} | _128² | ₄ | _3.82 | _2.23 | _3.5 | _{`--cfg=stylegan2 --gpus=4 --batch=64 --gamma=0.0512 --map-depth=2 --glr=0.0025 --dlr=0.0025 --cbase=16384`} | _256² | ₁ | _20.84 | _12.53 | _4.5 | _{`--cfg=stylegan2 --gpus=1 --batch=16 --gamma=0.8192 --map-depth=2 --glr=0.0025 --dlr=0.0025 --cbase=16384`} | _256² | ₂ | _10.93 | _6.36 | _4.5 | _{`--cfg=stylegan2 --gpus=2 --batch=32 --gamma=0.4096 --map-depth=2 --glr=0.0025 --dlr=0.0025 --cbase=16384`} | _256² | ₄ | _5.39 | _3.20 | _4.5 | _{`--cfg=stylegan2 --gpus=4 --batch=64 --gamma=0.2048 --map-depth=2 --glr=0.0025 --dlr=0.0025 --cbase=16384`} | _256² | ₈ | _3.89 | _2.38 | _2.6 | _{`--cfg=stylegan2 --gpus=8 --batch=64 --gamma=0.2048 --map-depth=2 --glr=0.0025 --dlr=0.0025 --cbase=16384`} | _512² | ₁ | _71.59 | _41.06 | _6.8 | _{`--cfg=stylegan2 --gpus=1 --batch=8 --gamma=6.5536 --map-depth=2 --glr=0.0025 --dlr=0.0025`} | _512² | ₂ | _36.79 | _20.83 | _6.8 | _{`--cfg=stylegan2 --gpus=2 --batch=16 --gamma=3.2768 --map-depth=2 --glr=0.0025 --dlr=0.0025`} | _512² | ₄ | _18.12 | _10.45 | _6.7 | _{`--cfg=stylegan2 --gpus=4 --batch=32 --gamma=1.6384 --map-depth=2 --glr=0.0025 --dlr=0.0025`} | _512² | ₈ | _9.09 | _5.24 | _6.8 | _{`--cfg=stylegan2 --gpus=8 --batch=64 --gamma=0.8192 --map-depth=2 --glr=0.0025 --dlr=0.0025`} | _1024² | ₁ | _141.83 | _90.39 | _7.2 | _{`--cfg=stylegan2 --gpus=1 --batch=4 --gamma=52.4288 --map-depth=2`} | _1024² | ₂ | _73.13 | _46.04 | _7.2 | _{`--cfg=stylegan2 --gpus=2 --batch=8 --gamma=26.2144 --map-depth=2`} | _1024² | ₄ | _36.95 | _23.15 | _7.0 | _{`--cfg=stylegan2 --gpus=4 --batch=16 --gamma=13.1072 --map-depth=2`} | _1024² | ₈ | _18.47 | _11.66 | _7.3 | _{`--cfg=stylegan2 --gpus=8 --batch=32 --gamma=6.5536 --map-depth=2`} #### Repro configurations | _Name

| _s/kimg
^(V100) | _s/kimg
^(A100) | _GPU
^mem | _Options

| :---------------------- | :--------------: | :--------------: | :------------: | :-- | _`stylegan2` | _17.55 | _14.57 | _6.2 | _{`--cfg=stylegan2 --gpus=8 --batch=32 --gamma=10`} | _`paper256` | _4.01 | _2.47 | _2.7 | _{`--cfg=stylegan2 --gpus=8 --batch=64 --gamma=1 --cbase=16384 --glr=0.0025 --dlr=0.0025 --mbstd-group=8`} | _`paper512` | _9.11 | _5.28 | _6.7 | _{`--cfg=stylegan2 --gpus=8 --batch=64 --gamma=0.5 --glr=0.0025 --dlr=0.0025 --mbstd-group=8`} | _`paper1024` | _18.56 | _11.75 | _6.9 | _{`--cfg=stylegan2 --gpus=8 --batch=32 --gamma=2`}