Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get "invalid device context" or "segmentation fault" problems in aarch64 machine #35

Open
DennisYoung96 opened this issue Aug 18, 2023 · 4 comments

Comments

@DennisYoung96
Copy link

thanks for ur excellent codes and open source spirit.

Enviroment Info
gpu-manager version: built on master
vcuda version: built on master
nvidia driver: 470.199.02 or 470.42.01 or 460.106.00
cpu : aarch64
gpu: Tesla T4

details
these days i get "invalid device context" or "segmentation fault" problems in my aarch64 machine.
when every app init ,it reports 5 functions not found
image
when i use CUDA samples
image
when use pytorch demo
image
when change to 460.x driver. it reports segmentation fault

but, it will works if i give whole gpu rates to one pod(set vcore=100)
the last, it does well in x86 machine (same 470.x driver and T4 gpu card) at the same time.

so, are there any diffrences between aarch64 driver and x86 drivers?
can any gentleman give advice on this?
need ur help

@DennisYoung96
Copy link
Author

@mYmNeo @hzliangbin please help

@hiahia121
Copy link

I meet the same problem, can anyone help

@ranxuxin001
Copy link

I saw some article said the visualization driven is different between x86 and aarch64. x86 use cgroupfs.

@ranxuxin001
Copy link

By the way, are you using nvidia tesla t4 with turing architecture?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants