Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64 AES Support #21

Merged
merged 12 commits into from
Feb 20, 2024
Merged

Arm64 AES Support #21

merged 12 commits into from
Feb 20, 2024

Conversation

pthariensflame
Copy link
Contributor

@pthariensflame pthariensflame commented Dec 26, 2023

This “works” but doesn’t pass some of the tests yet, and I'm not sure why.

@sunoru Fixes #20 and fixes #18, once it actually passes.

It also lays the groundwork for easier incorporation of further architectures with AES acceleration, like RISC-V, if we wish to pursue that in the future.

@pthariensflame
Copy link
Contributor Author

pthariensflame commented Dec 26, 2023

To understand what's going on with the mimicking of the x86 versions:

The A64 instruction AESE performs ShiftRows(SubBytes(xor(a, b)) and the A64 instruction AESMC performs MixColumns(a), while the x86 instruction AESENC performs xor(MixColumns(ShiftRows(SubBytes(a))), b). This obviously is a mismatch, so what we do is supply 0 for b thus nullifying the effect of A64's inner xor, and then perform the final xor that x86 wants manually; we also have to compose AESE and AESMC in the middle.

We could be noticeably more efficient if we could fuse together consecutive invocations of _aes_enc and thereby eliminate both the zero-materialization and the manual xor, because A64 does the xor at the start anyway so why would we waste it? Sadly I couldn't think of a way to do this that preserved the existing API. Figured something out as _aes_enc_full, but only AESNI can use it, not ARS, the way things are currently written.

As for _aes_key_gen_assist, the insight behind it (full credit to SSE2NEON who had it first and from whom I blatantly borrowed it) is that the x86 instruction AESKEYGENASSIST performs xor(SubBytes(a), Dup128(UInt64(r))), so we invoke AESE and then manually undo the ShiftRows; the latter is the job of _aes_key_gen_shuffle_helper.

@pthariensflame
Copy link
Contributor Author

The single failing test is:

rand(AESNI1x(x, ctr, aesni_key), UInt128)  0x60f4c27fe48fe1b8c5f4568a585b0dc0

Given that every other test is passing, I'm really not sure what's up.

@pthariensflame pthariensflame changed the title Arm64 AES Support [WIP] Arm64 AES Support [WIP?] Dec 26, 2023
@pthariensflame pthariensflame mentioned this pull request Dec 26, 2023
@pthariensflame
Copy link
Contributor Author

pthariensflame commented Dec 26, 2023

And if you're wondering how efficient this now is:

@code_native debuginfo=:none dump_module=false rand(AESNI4x(), UInt32)

now produces

	.section	__TEXT,__text,regular,pure_instructions
	ldr	x8, [x20, #16]
	ldr	x8, [x8, #16]
	ldr	xzr, [x8]
	ldr	x8, [x0, #40]
	cmp	x8, #4
	b.ne	L164
	mov	x8, #0
	ldr	q0, [x0, #16]
	adrp	x9, #0                          ; getproperty;
	ldr	x9, [x9, #264]
	ldr	q1, [x9]
	add.2d	v0, v0, v1
	str	q0, [x0, #16]
	ldr	x9, [x0, #32]
	ldp	q1, q2, [x9, #32]
	ldp	q3, q4, [x9, #64]
	ldp	q5, q6, [x9, #96]
	ldp	q7, q16, [x9, #128]
	ldr	q17, [x9, #160]
	ldp	q19, q18, [x9]
	aese.16b	v0, v19
	aesmc.16b	v0, v0
	aese.16b	v0, v18
	aesmc.16b	v0, v0
	aese.16b	v0, v1
	aesmc.16b	v0, v0
	aese.16b	v0, v2
	aesmc.16b	v0, v0
	aese.16b	v0, v3
	aesmc.16b	v0, v0
	aese.16b	v0, v4
	aesmc.16b	v0, v0
	aese.16b	v0, v5
	aesmc.16b	v0, v0
	aese.16b	v0, v6
	aesmc.16b	v0, v0
	aese.16b	v0, v7
	aesmc.16b	v0, v0
	aese.16b	v0, v16
	eor.16b	v0, v17, v0
	str	q0, [x0]
L164:
	add	x9, x8, #1
	str	x9, [x0, #40]
	ldr	w0, [x0, x8, lsl #2]
	ret

Hard to get much better!

@sunoru
Copy link
Member

sunoru commented Dec 27, 2023

Looks great! Thanks so much.

It is weird that there's only one test failing. Maybe there's actually a bug in the API? I am still too busy currently to look into that..

@pthariensflame
Copy link
Contributor Author

Double-checked the x86 version; even under Rosetta 2, all tests pass. So the failure is a genuine issue for this effort specifically.

@pthariensflame
Copy link
Contributor Author

@sunoru Finally figured out what was going on; the test was indeed a legitimate failure, pointing out a subtle incompatibility between how Arm and x86 treat the effective endianness of elements in a vector register. I resolved it in a way that's slightly clunky but compiles down to approximately nothing. All tests now pass! 🎉

@pthariensflame pthariensflame changed the title Arm64 AES Support [WIP?] Arm64 AES Support Feb 18, 2024
@sunoru
Copy link
Member

sunoru commented Feb 20, 2024

Wow looks great! Thanks so much for your work.

@sunoru sunoru merged commit 0045de9 into JuliaRandom:master Feb 20, 2024
0 of 15 checks passed
@pthariensflame pthariensflame deleted the arm64-support branch February 20, 2024 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants