-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64 AES Support #21
Conversation
To understand what's going on with the mimicking of the x86 versions: The A64 instruction We could be noticeably more efficient if we could fuse together consecutive invocations of As for |
The single failing test is: rand(AESNI1x(x, ctr, aesni_key), UInt128) ≡ 0x60f4c27fe48fe1b8c5f4568a585b0dc0 Given that every other test is passing, I'm really not sure what's up. |
And if you're wondering how efficient this now is: @code_native debuginfo=:none dump_module=false rand(AESNI4x(), UInt32) now produces .section __TEXT,__text,regular,pure_instructions
ldr x8, [x20, #16]
ldr x8, [x8, #16]
ldr xzr, [x8]
ldr x8, [x0, #40]
cmp x8, #4
b.ne L164
mov x8, #0
ldr q0, [x0, #16]
adrp x9, #0 ; getproperty;
ldr x9, [x9, #264]
ldr q1, [x9]
add.2d v0, v0, v1
str q0, [x0, #16]
ldr x9, [x0, #32]
ldp q1, q2, [x9, #32]
ldp q3, q4, [x9, #64]
ldp q5, q6, [x9, #96]
ldp q7, q16, [x9, #128]
ldr q17, [x9, #160]
ldp q19, q18, [x9]
aese.16b v0, v19
aesmc.16b v0, v0
aese.16b v0, v18
aesmc.16b v0, v0
aese.16b v0, v1
aesmc.16b v0, v0
aese.16b v0, v2
aesmc.16b v0, v0
aese.16b v0, v3
aesmc.16b v0, v0
aese.16b v0, v4
aesmc.16b v0, v0
aese.16b v0, v5
aesmc.16b v0, v0
aese.16b v0, v6
aesmc.16b v0, v0
aese.16b v0, v7
aesmc.16b v0, v0
aese.16b v0, v16
eor.16b v0, v17, v0
str q0, [x0]
L164:
add x9, x8, #1
str x9, [x0, #40]
ldr w0, [x0, x8, lsl #2]
ret Hard to get much better! |
Looks great! Thanks so much. It is weird that there's only one test failing. Maybe there's actually a bug in the API? I am still too busy currently to look into that.. |
Double-checked the x86 version; even under Rosetta 2, all tests pass. So the failure is a genuine issue for this effort specifically. |
There was a weird internal endianness issue.
@sunoru Finally figured out what was going on; the test was indeed a legitimate failure, pointing out a subtle incompatibility between how Arm and x86 treat the effective endianness of elements in a vector register. I resolved it in a way that's slightly clunky but compiles down to approximately nothing. All tests now pass! 🎉 |
Doesn't affect runtime behavior
Wow looks great! Thanks so much for your work. |
This “works” but doesn’t pass some of the tests yet, and I'm not sure why.
@sunoru Fixes #20 and fixes #18, once it actually passes.
It also lays the groundwork for easier incorporation of further architectures with AES acceleration, like RISC-V, if we wish to pursue that in the future.