关于量化模型加速推理 #2602
Unanswered
zhuchen1109
asked this question in
Q&A
关于量化模型加速推理
#2602
Replies: 1 comment
-
可以用下turbomind |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我使用internvl2-26b int4模型做推理,使用的是pytorch版本,发现vllm的awq_marlin会更快。
想请问,在lmdeploy里设置哪些可以加速推理呢?turbomimd在量化模型上会更好吗?
Beta Was this translation helpful? Give feedback.
All reactions