-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantize Bias for Conv/Gemm on Quantized Model #22889
base: main
Are you sure you want to change the base?
Conversation
60f58bc
to
4860244
Compare
// Bias is quantized to int32. | ||
ONNX_NAMESPACE::TypeProto int32_type_proto; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this data type work in general or would it potentially be EP specific. i.e. if the EP uses the quantized op is it universally expected that the bias input would be int32 if the data type is say 8-bit int for the quantized Conv/Gemm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In https://github.com/onnx/onnx/blob/main/docs/Operators.md#QLinearConv, T4 (for bias) is tensor(int32) only. Same from QGemm schema, so I guess this work in general?
// Bias DQ node produces output to Conv/Gemm node's input_2, with scale = input_scale_0 * input_scale_1, zp = 0. | ||
NodeArg& bias_dq_node_arg = | ||
graph.GetOrCreateNodeArg(graph.GenerateNodeArgName(node.Name() + "_bias_dq"), &bias_dq_type); | ||
Node& dp_node = graph.AddNode(graph.GenerateNodeName(node.Name() + "_bias_dq"), QDQ::DQOpName, "Bias DQ node", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dq_node?
Some quantized models don't have Conv/Gemm node's bias quantized but still leave them in float. This PR is to create a sub-graph to quantize the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1 and zp = 0. We only do this for bias initializer so that ConstantFolding will fold the sub-graph to a real quantized int32 bias initializer during the graph optimization next round.