Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adpter finetune #528

Merged
merged 10 commits into from
Jan 15, 2024
Merged

Adpter finetune #528

merged 10 commits into from
Jan 15, 2024

Conversation

xiezipeng-ML
Copy link
Contributor

@xiezipeng-ML xiezipeng-ML commented Jan 5, 2024

显存占用情况

llama 7b

1. full finetune

  • 1n8g fp16 1dp 8tp 1pp batch_size=1
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:3D:00.0 Off |                    0 |
| N/A   32C    P0              71W / 250W |  26725MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:3E:00.0 Off |                    0 |
| N/A   32C    P0              66W / 250W |  24318MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:40:00.0 Off |                    0 |
| N/A   31C    P0              66W / 250W |  24318MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:41:00.0 Off |                    0 |
| N/A   31C    P0              65W / 250W |  24296MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          Off | 00000000:B1:00.0 Off |                    0 |
| N/A   32C    P0              68W / 250W |  24318MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          Off | 00000000:B2:00.0 Off |                    0 |
| N/A   32C    P0              66W / 250W |  24296MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          Off | 00000000:B4:00.0 Off |                    0 |
| N/A   33C    P0              66W / 250W |  24320MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          Off | 00000000:B5:00.0 Off |                    0 |
| N/A   33C    P0              65W / 250W |  24288MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
  • 1n8g fp16 1dp 1tp 8pp batch_size=1
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:3D:00.0 Off |                    0 |
| N/A   34C    P0              65W / 250W |  21791MiB / 40960MiB |     78%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:3E:00.0 Off |                    0 |
| N/A   33C    P0              57W / 250W |  15434MiB / 40960MiB |     32%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:40:00.0 Off |                    0 |
| N/A   35C    P0              61W / 250W |  15486MiB / 40960MiB |     70%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:41:00.0 Off |                    0 |
| N/A   34C    P0              75W / 250W |  15438MiB / 40960MiB |     58%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          Off | 00000000:B1:00.0 Off |                    0 |
| N/A   35C    P0              70W / 250W |  18790MiB / 40960MiB |     42%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          Off | 00000000:B2:00.0 Off |                    0 |
| N/A   35C    P0              51W / 250W |  18838MiB / 40960MiB |     37%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          Off | 00000000:B4:00.0 Off |                    0 |
| N/A   36C    P0              52W / 250W |  18838MiB / 40960MiB |     51%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          Off | 00000000:B5:00.0 Off |                    0 |
| N/A   32C    P0              54W / 250W |   8508MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

2. adapter finetune

  • 1n8g fp16 1dp 8tp 1pp batch_size=1
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:3D:00.0 Off |                    0 |
| N/A   32C    P0              86W / 250W |  13187MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:3E:00.0 Off |                    0 |
| N/A   32C    P0              73W / 250W |  10222MiB / 40960MiB |     97%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:40:00.0 Off |                    0 |
| N/A   31C    P0              76W / 250W |  10222MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:41:00.0 Off |                    0 |
| N/A   31C    P0              81W / 250W |  10134MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          Off | 00000000:B1:00.0 Off |                    0 |
| N/A   32C    P0              79W / 250W |  10198MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          Off | 00000000:B2:00.0 Off |                    0 |
| N/A   32C    P0              75W / 250W |  10158MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          Off | 00000000:B4:00.0 Off |                    0 |
| N/A   33C    P0              77W / 250W |  10158MiB / 40960MiB |     99%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          Off | 00000000:B5:00.0 Off |                    0 |
| N/A   33C    P0              77W / 250W |  10198MiB / 40960MiB |     96%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
  • 1n8g fp16 1dp 1tp 8pp batch_size=1
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off | 00000000:3D:00.0 Off |                    0 |
| N/A   32C    P0              64W / 250W |   9431MiB / 40960MiB |     91%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off | 00000000:3E:00.0 Off |                    0 |
| N/A   31C    P0              34W / 250W |   7376MiB / 40960MiB |     10%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCIE-40GB          Off | 00000000:40:00.0 Off |                    0 |
| N/A   30C    P0              35W / 250W |   7428MiB / 40960MiB |     14%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCIE-40GB          Off | 00000000:41:00.0 Off |                    0 |
| N/A   30C    P0              44W / 250W |   7380MiB / 40960MiB |     37%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-PCIE-40GB          Off | 00000000:B1:00.0 Off |                    0 |
| N/A   33C    P0              86W / 250W |   8872MiB / 40960MiB |     50%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-PCIE-40GB          Off | 00000000:B2:00.0 Off |                    0 |
| N/A   33C    P0              44W / 250W |   8920MiB / 40960MiB |     17%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-PCIE-40GB          Off | 00000000:B4:00.0 Off |                    0 |
| N/A   34C    P0              74W / 250W |   8920MiB / 40960MiB |     27%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-PCIE-40GB          Off | 00000000:B5:00.0 Off |                    0 |
| N/A   31C    P0              53W / 250W |   3482MiB / 40960MiB |     46%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

@FxxxxU FxxxxU enabled auto-merge (squash) January 5, 2024 05:22
@levi131
Copy link

levi131 commented Jan 9, 2024

(1)应该用的都是混合精度训练吧。(2)llama 7b模型并行每张卡上模型大概0.875B,占用显存24G / 0.875 = 27.4,比这个(https://zhuanlan.zhihu.com/p/624740065) 知乎上分析的模型参数、后向传递计算得到的梯度、优化器状态三部分20倍,前向计算过程中产生的中间激活在bs=1时0.75倍略大一些,不过还算合理。(3)流水并行的时候显存占用更小一些,记得分离编译也是更适合流水并行。(4)adapter finetune这里显存降低了一半,不太清楚这个是偶发现象还是有特定规律的。

@xiezipeng-ML
Copy link
Contributor Author

(1)应该用的都是混合精度训练吧。(2)llama 7b模型并行每张卡上模型大概0.875B,占用显存24G / 0.875 = 27.4,比这个(https://zhuanlan.zhihu.com/p/624740065) 知乎上分析的模型参数、后向传递计算得到的梯度、优化器状态三部分20倍,前向计算过程中产生的中间激活在bs=1时0.75倍略大一些,不过还算合理。(3)流水并行的时候显存占用更小一些,记得分离编译也是更适合流水并行。(4)adapter finetune这里显存降低了一半,不太清楚这个是偶发现象还是有特定规律的。

(1)都是混合精度。
(2)知乎这个看起来不包含数据的显存。
(3)纯pipeline并行显存占用更少。
(4)adapter这里是有规律的:

  • part1 这里只有部分layer包含梯度
  • part2 这里是包含梯度的tensor

@xiezipeng-ML xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 10, 2024 03:19
@xiezipeng-ML xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 11, 2024 06:35
@xiezipeng-ML xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 11, 2024 06:41
@xiezipeng-ML xiezipeng-ML requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 12, 2024 04:18
@loxs123 loxs123 self-requested a review January 12, 2024 06:35
@@ -18,18 +19,17 @@


def prepare(
destination_path: Path = Path("/alpaca_data"),
checkpoint_dir: Path = Path("/Llama-2-7b-hf"),
destination_path: Path = Path("/data/home/xiezipeng/datasets/alpaca_data"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方是不是用相对路径好一些

@xiezipeng-ML xiezipeng-ML requested review from loxs123 and oneflow-ci-bot and removed request for oneflow-ci-bot January 15, 2024 12:55
@FxxxxU FxxxxU merged commit 1185ad9 into main Jan 15, 2024
2 checks passed
@FxxxxU FxxxxU deleted the adpter_finetune branch January 15, 2024 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants