You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
RuntimeError: call aclnnBatchMatMul failed, detail:EZ1001: 2024-11-18-06:32:38.253.100 Input tensor's shape[[3,64,67]] should be same with output's shape[[1,64,67]].
TraceBack (most recent call last):
Params check failed.
Reproduction
from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
if name == "main":
pipe = pipeline("/opt/lmdeploy/models/Qwen2-VL-7B-Instruct",
backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
question = ["Shanghai is", "Please introduce China", "How are you?"]
response = pipe(question)
print(response)
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:301: ImportWarning:
*************************************************************************************************************
The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
The backend in torch.distributed.init_process_group set to hccl now..
The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
The device parameters have been replaced with npu in the functionbelow:
torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
*************************************************************************************************************
warnings.warn(msg, ImportWarning)
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/contrib/transfer_to_npu.py:260: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
warnings.warn(msg, RuntimeWarning)
[W compiler_depend.ts:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
<frozen importlib._bootstrap>:914: ImportWarning: TEMetaPathFinder.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:671: ImportWarning: TBEMetaPathLoader.exec_module() not found; falling back to load_module()
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
2024-11-18 06:32:32,177 - lmdeploy - WARNING - __init__.py:165 - LMDeploy requires transformers version: [4.33.0 ~ 4.44.1], but found version: 4.46.2
/opt/lmdeploy/lmdeploy/serve/utils.py:22: DeprecationWarning: There is no current event loop
event_loop = asyncio.get_event_loop()
/opt/lmdeploy/lmdeploy/serve/async_engine.py:504: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(f'GenerationConfig: {gen_config}')
2024-11-18 06:32:38,120 - lmdeploy - WARNING - async_engine.py:504 - GenerationConfig: GenerationConfig(n=1, max_new_tokens=512, do_sample=False, top_p=1.0, top_k=50, min_p=0.0, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, stop_token_ids=[151645], bad_token_ids=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, response_format=None, logits_processors=None)
2024-11-18 06:32:38,120 - lmdeploy - WARNING - async_engine.py:505 - Since v0.6.0, lmdeploy add `do_sample`in GenerationConfig. It defaults to False, meaning greedy decoding. Please set`do_sample=True`if sampling decoding is needed
/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/storage.py:38: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
if self.device.type != 'cpu':
2024-11-18 06:32:38,257 - lmdeploy - ERROR - request.py:21 - Engine loop failed with error: call aclnnBatchMatMul failed, detail:EZ1001: 2024-11-18-06:32:38.253.100 Input tensor's shape[[3,64,67]] should be same with output's shape[[1,64,67]].
TraceBack (most recent call last):
Params check failed.
[ERROR] 2024-11-18-06:32:38 (PID:5936, Device:0, RankID:-1) ERR01005 OPS internal error
Traceback (most recent call last):
File "/opt/lmdeploy/lmdeploy/pytorch/engine/request.py", line 17, in _raise_exception_on_finish
task.result()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 963, in async_loop
await self._async_loop()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 957, in _async_loop
await __step()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 945, in __step
raise e
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 939, in __step
raise out
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 873, in _async_loop_background
await self._async_step_background(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 755, in _async_step_background
output = await self._async_model_forward(
File "/opt/lmdeploy/lmdeploy/utils.py", line 241, in __tmp
return (await func(*args, **kwargs))
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 646, in _async_model_forward
ret = await __forward(inputs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 624, in __forward
return await self.model_agent.async_forward(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 303, in async_forward
output = self._forward_impl(inputs,
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 270, in _forward_impl
output = model_forward(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 153, in model_forward
output = model(**input_dict)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 25, in __call__
return self.model(**kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 383, in forward
hidden_states = self.model(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 312, in forward
cos, sin = _apply_mrope_selection(hidden_states,
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_vl.py", line 28, in _apply_mrope_selection
cos, sin = rotary_emb_func(hidden_states, _mrope_position_ids)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/dlinfer/rotary_embedding.py", line 64, in forward
return _rotary_embedding_fwd(position_ids,
File "/opt/lmdeploy/lmdeploy/pytorch/backends/dlinfer/rotary_embedding.py", line 30, in _rotary_embedding_fwd
tmp = torch.bmm(inv_freq_expanded, position_ids_expanded)
RuntimeError: call aclnnBatchMatMul failed, detail:EZ1001: 2024-11-18-06:32:38.253.100 Input tensor's shape[[3,64,67]] should be same with output's shape[[1,64,67]].
TraceBack (most recent call last):
Params check failed.
[ERROR] 2024-11-18-06:32:38 (PID:5936, Device:0, RankID:-1) ERR01005 OPS internal error
The text was updated successfully, but these errors were encountered:
Checklist
Describe the bug
RuntimeError: call aclnnBatchMatMul failed, detail:EZ1001: 2024-11-18-06:32:38.253.100 Input tensor's shape[[3,64,67]] should be same with output's shape[[1,64,67]].
TraceBack (most recent call last):
Params check failed.
Reproduction
from lmdeploy import pipeline
from lmdeploy import PytorchEngineConfig
if name == "main":
pipe = pipeline("/opt/lmdeploy/models/Qwen2-VL-7B-Instruct",
backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True))
question = ["Shanghai is", "Please introduce China", "How are you?"]
response = pipe(question)
print(response)
Environment
华为昇腾(Atlas 800T A2)910B Ascend docker runtime --6.0.RC2 linux-aarch64 Ascend cann toolkit --8.0.RC2 linux-aarch64 Ascend cann kernels-910b 8.0.RC2 ## python环境 Package Version Editable project location ------------------------- ----------- ------------------------- absl-py 2.1.0 accelerate 1.1.1 addict 2.4.0 aiohappyeyeballs 2.4.3 aiohttp 3.11.2 aiosignal 1.3.1 annotated-types 0.7.0 anyio 4.6.2.post1 ascendebug 0.1.0 async-timeout 5.0.1 attr 0.3.2 attrs 24.2.0 auto-tune 0.1.0 av 13.1.0 certifi 2024.8.30 cffi 1.17.1 charset-normalizer 3.4.0 click 8.1.7 cloudpickle 3.1.0 cmake 3.31.0.1 dataflow 0.0.1 datasets 3.1.0 decorator 5.1.1 dill 0.3.8 diskcache 5.6.3 distro 1.9.0 dlinfer-ascend 0.1.1.post2 einops 0.8.0 exceptiongroup 1.2.2 fastapi 0.115.5 filelock 3.16.1 fire 0.7.0 frozenlist 1.5.0 fsspec 2024.9.0 h11 0.14.0 hccl 0.1.0 hccl-parser 0.1 httpcore 1.0.7 httpx 0.27.2 huggingface-hub 0.26.2 idna 3.10 interegular 0.3.3 Jinja2 3.1.4 jiter 0.7.1 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 lark 1.2.2 llm-datadist 0.0.1 llm-engine 0.0.1 llvmlite 0.43.0 lmdeploy 0.6.2 /opt/lmdeploy markdown-it-py 3.0.0 MarkupSafe 3.0.2 mdurl 0.1.2 ml_dtypes 0.5.0 mmengine-lite 0.10.5 mpmath 1.3.0 msadvisor 1.0.0 multidict 6.1.0 multiprocess 0.70.16 nest-asyncio 1.6.0 networkx 3.4.2 ninja 1.11.1.1 numba 0.60.0 numpy 1.24.0 op-compile-tool 0.1.0 op-gen 0.1 op-test-frame 0.1 opc-tool 0.1.0 openai 1.54.4 outlines 0.0.46 packaging 24.2 pandas 2.2.3 pathlib2 2.3.7.post1 peft 0.11.1 pillow 11.0.0 pip 24.3.1 platformdirs 4.3.6 propcache 0.2.0 protobuf 5.28.3 psutil 6.1.0 pyairports 2.1.1 pyarrow 18.0.0 pycountry 24.6.1 pycparser 2.22 pydantic 2.9.2 pydantic_core 2.23.4 Pygments 2.18.0 pynvml 11.5.3 python-dateutil 2.9.0.post0 pytz 2024.2 PyYAML 6.0.2 qwen-vl-utils 0.0.8 referencing 0.35.1 regex 2024.11.6 requests 2.32.3 rich 13.9.4 rpds-py 0.21.0 safetensors 0.4.5 schedule-search 0.0.1 scikit-build 0.18.0 scipy 1.14.1 sentencepiece 0.2.0 setuptools 69.5.1 shortuuid 1.0.13 six 1.16.0 sniffio 1.3.1 starlette 0.41.2 sympy 1.13.3 te 0.4.0 termcolor 2.5.0 tiktoken 0.8.0 timm 1.0.11 tokenizers 0.20.3 tomli 2.1.0 torch 2.3.1 torch-npu 2.3.1 torchvision 0.18.1 tornado 6.4.1 tqdm 4.67.0 transformers 4.46.2 typing_extensions 4.12.2 tzdata 2024.2 urllib3 2.2.3 uvicorn 0.32.0 wheel 0.43.0 xxhash 3.5.0 yapf 0.43.0 yarl 1.17.1
Error traceback
The text was updated successfully, but these errors were encountered: