Adpter finetune #528

xiezipeng-ML · 2024-01-05T03:28:58Z

显存占用情况

llama 7b

1. full finetune

1n8g fp16 1dp 8tp 1pp batch_size=1

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:3D:00.0 Off |                    0 |
| N/A   32C    P0              71W / 250W |  26725MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:3E:00.0 Off |                    0 |
| N/A   32C    P0              66W / 250W |  24318MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:40:00.0 Off |                    0 |
| N/A   31C    P0              66W / 250W |  24318MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:41:00.0 Off |                    0 |
| N/A   31C    P0              65W / 250W |  24296MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          Off | 00000000:B1:00.0 Off |                    0 |
| N/A   32C    P0              68W / 250W |  24318MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          Off | 00000000:B2:00.0 Off |                    0 |
| N/A   32C    P0              66W / 250W |  24296MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          Off | 00000000:B4:00.0 Off |                    0 |
| N/A   33C    P0              66W / 250W |  24320MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          Off | 00000000:B5:00.0 Off |                    0 |
| N/A   33C    P0              65W / 250W |  24288MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

1n8g fp16 1dp 1tp 8pp batch_size=1

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:3D:00.0 Off |                    0 |
| N/A   34C    P0              65W / 250W |  21791MiB / 40960MiB |     78%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:3E:00.0 Off |                    0 |
| N/A   33C    P0              57W / 250W |  15434MiB / 40960MiB |     32%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:40:00.0 Off |                    0 |
| N/A   35C    P0              61W / 250W |  15486MiB / 40960MiB |     70%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:41:00.0 Off |                    0 |
| N/A   34C    P0              75W / 250W |  15438MiB / 40960MiB |     58%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          Off | 00000000:B1:00.0 Off |                    0 |
| N/A   35C    P0              70W / 250W |  18790MiB / 40960MiB |     42%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          Off | 00000000:B2:00.0 Off |                    0 |
| N/A   35C    P0              51W / 250W |  18838MiB / 40960MiB |     37%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          Off | 00000000:B4:00.0 Off |                    0 |
| N/A   36C    P0              52W / 250W |  18838MiB / 40960MiB |     51%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          Off | 00000000:B5:00.0 Off |                    0 |
| N/A   32C    P0              54W / 250W |   8508MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

2. adapter finetune

1n8g fp16 1dp 8tp 1pp batch_size=1

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:3D:00.0 Off |                    0 |
| N/A   32C    P0              86W / 250W |  13187MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:3E:00.0 Off |                    0 |
| N/A   32C    P0              73W / 250W |  10222MiB / 40960MiB |     97%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:40:00.0 Off |                    0 |
| N/A   31C    P0              76W / 250W |  10222MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:41:00.0 Off |                    0 |
| N/A   31C    P0              81W / 250W |  10134MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          Off | 00000000:B1:00.0 Off |                    0 |
| N/A   32C    P0              79W / 250W |  10198MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          Off | 00000000:B2:00.0 Off |                    0 |
| N/A   32C    P0              75W / 250W |  10158MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          Off | 00000000:B4:00.0 Off |                    0 |
| N/A   33C    P0              77W / 250W |  10158MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          Off | 00000000:B5:00.0 Off |                    0 |
| N/A   33C    P0              77W / 250W |  10198MiB / 40960MiB |     96%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

1n8g fp16 1dp 1tp 8pp batch_size=1

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:3D:00.0 Off |                    0 |
| N/A   32C    P0              64W / 250W |   9431MiB / 40960MiB |     91%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:3E:00.0 Off |                    0 |
| N/A   31C    P0              34W / 250W |   7376MiB / 40960MiB |     10%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:40:00.0 Off |                    0 |
| N/A   30C    P0              35W / 250W |   7428MiB / 40960MiB |     14%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:41:00.0 Off |                    0 |
| N/A   30C    P0              44W / 250W |   7380MiB / 40960MiB |     37%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          Off | 00000000:B1:00.0 Off |                    0 |
| N/A   33C    P0              86W / 250W |   8872MiB / 40960MiB |     50%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          Off | 00000000:B2:00.0 Off |                    0 |
| N/A   33C    P0              44W / 250W |   8920MiB / 40960MiB |     17%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          Off | 00000000:B4:00.0 Off |                    0 |
| N/A   34C    P0              74W / 250W |   8920MiB / 40960MiB |     27%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          Off | 00000000:B5:00.0 Off |                    0 |
| N/A   31C    P0              53W / 250W |   3482MiB / 40960MiB |     46%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

levi131 · 2024-01-09T11:40:37Z

（1）应该用的都是混合精度训练吧。（2）llama 7b模型并行每张卡上模型大概0.875B，占用显存24G / 0.875 = 27.4，比这个（https://zhuanlan.zhihu.com/p/624740065）知乎上分析的模型参数、后向传递计算得到的梯度、优化器状态三部分20倍，前向计算过程中产生的中间激活在bs=1时0.75倍略大一些，不过还算合理。（3）流水并行的时候显存占用更小一些，记得分离编译也是更适合流水并行。（4）adapter finetune这里显存降低了一半，不太清楚这个是偶发现象还是有特定规律的。

xiezipeng-ML · 2024-01-09T14:11:51Z

（1）应该用的都是混合精度训练吧。（2）llama 7b模型并行每张卡上模型大概0.875B，占用显存24G / 0.875 = 27.4，比这个（https://zhuanlan.zhihu.com/p/624740065）知乎上分析的模型参数、后向传递计算得到的梯度、优化器状态三部分20倍，前向计算过程中产生的中间激活在bs=1时0.75倍略大一些，不过还算合理。（3）流水并行的时候显存占用更小一些，记得分离编译也是更适合流水并行。（4）adapter finetune这里显存降低了一半，不太清楚这个是偶发现象还是有特定规律的。

（1）都是混合精度。
（2）知乎这个看起来不包含数据的显存。
（3）纯pipeline并行显存占用更少。
（4）adapter这里是有规律的：

part1 这里只有部分layer包含梯度
part2 这里是包含梯度的tensor

loxs123 · 2024-01-12T06:46:01Z

projects/Llama/utils/prepare_alpaca.py

@@ -18,18 +19,17 @@


 def prepare(
-    destination_path: Path = Path("/alpaca_data"),
-    checkpoint_dir: Path = Path("/Llama-2-7b-hf"),
+    destination_path: Path = Path("/data/home/xiezipeng/datasets/alpaca_data"),


这个地方是不是用相对路径好一些

xiezipeng-ML added 3 commits January 4, 2024 22:12

support adapter finetune

d3b9a48

refine

df37b00

reformat

a96cf21

FxxxxU enabled auto-merge (squash) January 5, 2024 05:22

xiezipeng-ML added 2 commits January 9, 2024 13:14

refine

66af63c

refine

4c79622

xiezipeng-ML requested a review from oneflow-ci-bot January 9, 2024 14:31

reformat

449401e

xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 10, 2024 03:19

refine

4172adc

xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 11, 2024 06:35

reformat

25c46d5

xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 11, 2024 06:41

Merge branch 'main' into adpter_finetune

f4c64c2

xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 12, 2024 04:18

loxs123 self-requested a review January 12, 2024 06:35

update path

d4a0073

loxs123 approved these changes Jan 15, 2024

View reviewed changes

xiezipeng-ML requested review from loxs123 and oneflow-ci-bot and removed request for oneflow-ci-bot January 15, 2024 12:55

FxxxxU merged commit 1185ad9 into main Jan 15, 2024
2 checks passed

FxxxxU deleted the adpter_finetune branch January 15, 2024 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adpter finetune #528

Adpter finetune #528

xiezipeng-ML commented Jan 5, 2024 •

edited

Loading

levi131 commented Jan 9, 2024 •

edited

Loading

xiezipeng-ML commented Jan 9, 2024

loxs123 Jan 12, 2024

Adpter finetune #528

Adpter finetune #528

Conversation

xiezipeng-ML commented Jan 5, 2024 • edited Loading

显存占用情况

1. full finetune

2. adapter finetune

levi131 commented Jan 9, 2024 • edited Loading

xiezipeng-ML commented Jan 9, 2024

loxs123 Jan 12, 2024

Choose a reason for hiding this comment

xiezipeng-ML commented Jan 5, 2024 •

edited

Loading

levi131 commented Jan 9, 2024 •

edited

Loading