关于多模态模型推理启用prefix cache #2823
Unanswered
zhuchen1109
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我使用internvl-8b模型,因为我的prompt system会很长,我想开启来做推理加速,现在开启prefix cache会有些问题,因为图片token只是padding,很大概率被match住,我想问下,如果我修改代码来保证image部分不被match,是不是prefix cache对于我这个任务来说是有效的?
Beta Was this translation helpful? Give feedback.
All reactions