-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adpter finetune #528
Adpter finetune #528
Conversation
(1)应该用的都是混合精度训练吧。(2)llama 7b模型并行每张卡上模型大概0.875B,占用显存24G / 0.875 = 27.4,比这个(https://zhuanlan.zhihu.com/p/624740065) 知乎上分析的模型参数、后向传递计算得到的梯度、优化器状态三部分20倍,前向计算过程中产生的中间激活在bs=1时0.75倍略大一些,不过还算合理。(3)流水并行的时候显存占用更小一些,记得分离编译也是更适合流水并行。(4)adapter finetune这里显存降低了一半,不太清楚这个是偶发现象还是有特定规律的。 |
(1)都是混合精度。 |
@@ -18,18 +19,17 @@ | |||
|
|||
|
|||
def prepare( | |||
destination_path: Path = Path("/alpaca_data"), | |||
checkpoint_dir: Path = Path("/Llama-2-7b-hf"), | |||
destination_path: Path = Path("/data/home/xiezipeng/datasets/alpaca_data"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方是不是用相对路径好一些
显存占用情况
llama 7b
1. full finetune
2. adapter finetune