[Bug] 昇腾910B通过lmdeploy镜像，使用qwen2-vl-7b模型，推理过程报错： call aclnnBatchMatMul failed #2769

fusmile0101 · 2024-11-18T08:40:38Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

RuntimeError: call aclnnBatchMatMul failed, detail:EZ1001: 2024-11-18-06:32:38.253.100 Input tensor's shape[[3,64,67]] should be same with output's shape[[1,64,67]].
TraceBack (most recent call last):
Params check failed.

Reproduction

from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
if name == "main":
pipe = pipeline("/opt/lmdeploy/models/Qwen2-VL-7B-Instruct",
backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
question = ["Shanghai is", "Please introduce China", "How are you?"]
response = pipe(question)
print(response)

Environment

华为昇腾（Atlas 800T A2）910B
Ascend docker runtime --6.0.RC2 linux-aarch64
Ascend cann toolkit --8.0.RC2 linux-aarch64
Ascend cann kernels-910b 8.0.RC2 

## python环境
Package                   Version     Editable project location
------------------------- ----------- -------------------------
absl-py                   2.1.0
accelerate                1.1.1
addict                    2.4.0
aiohappyeyeballs          2.4.3
aiohttp                   3.11.2
aiosignal                 1.3.1
annotated-types           0.7.0
anyio                     4.6.2.post1
ascendebug                0.1.0
async-timeout             5.0.1
attr                      0.3.2
attrs                     24.2.0
auto-tune                 0.1.0
av                        13.1.0
certifi                   2024.8.30
cffi                      1.17.1
charset-normalizer        3.4.0
click                     8.1.7
cloudpickle               3.1.0
cmake                     3.31.0.1
dataflow                  0.0.1
datasets                  3.1.0
decorator                 5.1.1
dill                      0.3.8
diskcache                 5.6.3
distro                    1.9.0
dlinfer-ascend            0.1.1.post2
einops                    0.8.0
exceptiongroup            1.2.2
fastapi                   0.115.5
filelock                  3.16.1
fire                      0.7.0
frozenlist                1.5.0
fsspec                    2024.9.0
h11                       0.14.0
hccl                      0.1.0
hccl-parser               0.1
httpcore                  1.0.7
httpx                     0.27.2
huggingface-hub           0.26.2
idna                      3.10
interegular               0.3.3
Jinja2                    3.1.4
jiter                     0.7.1
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
lark                      1.2.2
llm-datadist              0.0.1
llm-engine                0.0.1
llvmlite                  0.43.0
lmdeploy                  0.6.2       /opt/lmdeploy
markdown-it-py            3.0.0
MarkupSafe                3.0.2
mdurl                     0.1.2
ml_dtypes                 0.5.0
mmengine-lite             0.10.5
mpmath                    1.3.0
msadvisor                 1.0.0
multidict                 6.1.0
multiprocess              0.70.16
nest-asyncio              1.6.0
networkx                  3.4.2
ninja                     1.11.1.1
numba                     0.60.0
numpy                     1.24.0
op-compile-tool           0.1.0
op-gen                    0.1
op-test-frame             0.1
opc-tool                  0.1.0
openai                    1.54.4
outlines                  0.0.46
packaging                 24.2
pandas                    2.2.3
pathlib2                  2.3.7.post1
peft                      0.11.1
pillow                    11.0.0
pip                       24.3.1
platformdirs              4.3.6
propcache                 0.2.0
protobuf                  5.28.3
psutil                    6.1.0
pyairports                2.1.1
pyarrow                   18.0.0
pycountry                 24.6.1
pycparser                 2.22
pydantic                  2.9.2
pydantic_core             2.23.4
Pygments                  2.18.0
pynvml                    11.5.3
python-dateutil           2.9.0.post0
pytz                      2024.2
PyYAML                    6.0.2
qwen-vl-utils             0.0.8
referencing               0.35.1
regex                     2024.11.6
requests                  2.32.3
rich                      13.9.4
rpds-py                   0.21.0
safetensors               0.4.5
schedule-search           0.0.1
scikit-build              0.18.0
scipy                     1.14.1
sentencepiece             0.2.0
setuptools                69.5.1
shortuuid                 1.0.13
six                       1.16.0
sniffio                   1.3.1
starlette                 0.41.2
sympy                     1.13.3
te                        0.4.0
termcolor                 2.5.0
tiktoken                  0.8.0
timm                      1.0.11
tokenizers                0.20.3
tomli                     2.1.0
torch                     2.3.1
torch-npu                 2.3.1
torchvision               0.18.1
tornado                   6.4.1
tqdm                      4.67.0
transformers              4.46.2
typing_extensions         4.12.2
tzdata                    2024.2
urllib3                   2.2.3
uvicorn                   0.32.0
wheel                     0.43.0
xxhash                    3.5.0
yapf                      0.43.0
yarl                      1.17.1

Error traceback

/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:301: ImportWarning:
    *************************************************************************************************************
    The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
    The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
    The backend in torch.distributed.init_process_group set to hccl now..
    The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
    The device parameters have been replaced with npu in the function below:
    torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
    *************************************************************************************************************

  warnings.warn(msg, ImportWarning)
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:260: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
  warnings.warn(msg, RuntimeWarning)
[W compiler_depend.ts:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
2024-11-18 06:32:32,177 - lmdeploy - WARNING - __init__.py:165 - LMDeploy requires transformers version: [4.33.0 ~ 4.44.1], but found version: 4.46.2
/opt/lmdeploy/lmdeploy/serve/utils.py:22: DeprecationWarning: There is no current event loop
  event_loop = asyncio.get_event_loop()
/opt/lmdeploy/lmdeploy/serve/async_engine.py:504: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(f'GenerationConfig: {gen_config}')
2024-11-18 06:32:38,120 - lmdeploy - WARNING - async_engine.py:504 - GenerationConfig: GenerationConfig(n=1, max_new_tokens=512, do_sample=False, top_p=1.0, top_k=50, min_p=0.0, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, stop_token_ids=[151645], bad_token_ids=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, response_format=None, logits_processors=None)
2024-11-18 06:32:38,120 - lmdeploy - WARNING - async_engine.py:505 - Since v0.6.0, lmdeploy add `do_sample` in GenerationConfig. It defaults to False, meaning greedy decoding. Please set `do_sample=True` if sampling  decoding is needed
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/storage.py:38: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  if self.device.type != 'cpu':
2024-11-18 06:32:38,257 - lmdeploy - ERROR - request.py:21 - Engine loop failed with error: call aclnnBatchMatMul failed, detail:EZ1001: 2024-11-18-06:32:38.253.100 Input tensor's shape[[3,64,67]] should be same with output's shape[[1,64,67]].
        TraceBack (most recent call last):
        Params check failed.

[ERROR] 2024-11-18-06:32:38 (PID:5936, Device:0, RankID:-1) ERR01005 OPS internal error
Traceback (most recent call last):
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/request.py", line 17, in _raise_exception_on_finish
    task.result()
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 963, in async_loop
    await self._async_loop()
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 957, in _async_loop
    await __step()
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 945, in __step
    raise e
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 939, in __step
    raise out
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 873, in _async_loop_background
    await self._async_step_background(
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 755, in _async_step_background
    output = await self._async_model_forward(
  File "/opt/lmdeploy/lmdeploy/utils.py", line 241, in __tmp
    return (await func(*args, **kwargs))
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 646, in _async_model_forward
    ret = await __forward(inputs)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 624, in __forward
    return await self.model_agent.async_forward(
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 303, in async_forward
    output = self._forward_impl(inputs,
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 270, in _forward_impl
    output = model_forward(
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 153, in model_forward
    output = model(**input_dict)
  File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 25, in __call__
    return self.model(**kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 383, in forward
    hidden_states = self.model(
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 312, in forward
    cos, sin = _apply_mrope_selection(hidden_states,
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 28, in _apply_mrope_selection
    cos, sin = rotary_emb_func(hidden_states, _mrope_position_ids)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/backends/dlinfer/rotary_embedding.py", line 64, in forward
    return _rotary_embedding_fwd(position_ids,
  File "/opt/lmdeploy/lmdeploy/pytorch/backends/dlinfer/rotary_embedding.py", line 30, in _rotary_embedding_fwd
    tmp = torch.bmm(inv_freq_expanded, position_ids_expanded)
RuntimeError: call aclnnBatchMatMul failed, detail:EZ1001: 2024-11-18-06:32:38.253.100 Input tensor's shape[[3,64,67]] should be same with output's shape[[1,64,67]].
        TraceBack (most recent call last):
        Params check failed.

[ERROR] 2024-11-18-06:32:38 (PID:5936, Device:0, RankID:-1) ERR01005 OPS internal error

lvhan028 assigned jinminxi104 Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 昇腾910B通过lmdeploy镜像，使用qwen2-vl-7b模型，推理过程报错： call aclnnBatchMatMul failed #2769

[Bug] 昇腾910B通过lmdeploy镜像，使用qwen2-vl-7b模型，推理过程报错： call aclnnBatchMatMul failed #2769

fusmile0101 commented Nov 18, 2024

[Bug] 昇腾910B通过lmdeploy镜像，使用qwen2-vl-7b模型，推理过程报错： call aclnnBatchMatMul failed #2769

[Bug] 昇腾910B通过lmdeploy镜像，使用qwen2-vl-7b模型，推理过程报错： call aclnnBatchMatMul failed #2769

Comments

fusmile0101 commented Nov 18, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback