News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

AES instructions

Started by XeonCore, April 26, 2015, 10:38:51 PM

Previous topic - Next topic

XeonCore

Hi,

My question is, which command line x86 (or x64) assembler is everyone using which supports AES-NI? (aesenc, aesenclast etc.) My Google foo has left me!

I have a large Windows project on the go (using JWASM), which has an encryption element.
Currently it uses AES look-up tables, and I would like to add AES-NI for an extra boost, but discover that JWASM doesn't support these instructions.
Although, I am not sure if I can face switching over 8 thousand lines of code to another assembler at this late stage!

Any thoughts would be appreciated.

Thanks

qWord

As quick fix you could setup some macros to create AES instructions. The idea is to modify the opcode field of supported instructions, which have the same layout as the AES instructions. e.g.:
IFNDEF AESDEC

MOD_PMULLD macro _name,_alias,b0,b1,b2,b3
_name macro arg1,arg2
LOCAL lbl1,lbl2
lbl1:
PMULLD arg1,arg2
lbl2:
org lbl1
db b0,b1,b2,b3
org lbl2
endm
_alias TEXTEQU <&_name>
endm

MOD_ROUNDPD  macro _name,_alias,b0,b1,b2,b3
_name macro arg1,arg2,imm8
LOCAL lbl1,lbl2
lbl1:
ROUNDPD arg1,arg2,imm8
lbl2:
org lbl1
db b0,b1,b2,b3
org lbl2
endm
_alias TEXTEQU <&_name>
endm

MOD_PMULLD AESDEC,aesdec,66h,0Fh,38h,0DEh
MOD_PMULLD AESDECLAST,aesdeclast,66h,0Fh,38h,0DFh
MOD_PMULLD AESENC,aesenc,66h,0Fh,38h,0DCh
MOD_PMULLD AESENCLAST,aesenclast,66h,0Fh,38h,0DDh
MOD_PMULLD AESIMC,aesimc,66h,0Fh,38h,0DBh
MOD_ROUNDPD AESKEYGENASSIST,aeskeygenassist,66h,0Fh,3Ah,0DFh

ENDIF

These macros work for x86 with XMM registers. For x64 they only work (AFAICS), as long as no REX-prefix is required (xmm8-15, r8-r15).
(test with jwasm v2.12pre)

Otherwise you might ask habran to implement that instructions in his JWASM fork.
MREAL macros - when you need floating point arithmetic while assembling!

wjr

I just finished adding support for those instructions in GoAsm 0.60. No switching over if there were only a few new procedures, you could assemble those as a smaller project with GoAsm, then link the OBJ file with the larger project.

XeonCore

I decided, at least temporally, to just put the codes where the instructions should be.
It looks a bit messy, but my rounds are now 2.6x faster! And it works with JWASM.
I ended up going to memory for the Roundkey, as storing in XMM1..7 beforehand was not significantly faster.

The actual rounds (unfinished).

.code
EncRounds proc ; Returns in XMM0
pxor XMM0, xmmword ptr [RoundKey] ; XMM0 = AES State

db 066h, 0Fh, 038h, 0DCh, 05h ; AES 256 Round 1
dd offset [RoundKey + 16] ;  = aesenc XMM0, xmmword ptr [RoundKey + 16]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 32]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 48]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 64]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 80]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 96]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 112]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 128]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 144]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 160]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 176]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 192]

db 066h, 0Fh, 038h, 0DCh, 05h
dd offset [RoundKey + 208]

db 066h, 0Fh, 038h, 0DDh, 05h ; AES 256 Round 14
dd offset [RoundKey + 224] ;  = aesenclast XMM0, xmmword ptr [RoundKey + 224]
ret
EncRounds endp


---

For reference, after a bit of fiddling with Olly:-

Encrypting
db 066h, 0Fh, 038h, 0DCh, 0C1h ; = aesenc XMM0, XMM1
db 066h, 0Fh, 038h, 0DCh, 0C2h ; = aesenc XMM0, XMM2
db 066h, 0Fh, 038h, 0DCh, 0C3h ; = aesenc XMM0, XMM3
db 066h, 0Fh, 038h, 0DCh, 0C4h ; = aesenc XMM0, XMM4
db 066h, 0Fh, 038h, 0DCh, 0C5h ; = aesenc XMM0, XMM5
db 066h, 0Fh, 038h, 0DCh, 0C6h ; = aesenc XMM0, XMM6
db 066h, 0Fh, 038h, 0DCh, 0C7h ; = aesenc XMM0, XMM7

db 066h, 0Fh, 038h, 0DCh, 05h  ; = aesenc XMM0, xmmword ptr [RoundKey]
dd offset [RoundKey]

db 066h, 0Fh, 038h, 0DDh, 0C1h ; = aesenclast XMM0, XMM1
db 066h, 0Fh, 038h, 0DDh, 0C2h ; = aesenclast XMM0, XMM2
db 066h, 0Fh, 038h, 0DDh, 0C3h ; = aesenclast XMM0, XMM3
db 066h, 0Fh, 038h, 0DDh, 0C4h ; = aesenclast XMM0, XMM4
db 066h, 0Fh, 038h, 0DDh, 0C5h ; = aesenclast XMM0, XMM5
db 066h, 0Fh, 038h, 0DDh, 0C6h ; = aesenclast XMM0, XMM6
db 066h, 0Fh, 038h, 0DDh, 0C7h ; = aesenclast XMM0, XMM7

db 066h, 0Fh, 038h, 0DDh, 05h ; = aesenclast XMM0, xmmword ptr [RoundKey]
dd offset [RoundKey]

Decrypting
db 066h, 0Fh, 038h, 0DEh, 0C1h ; = aesdec XMM0, XMM1
db 066h, 0Fh, 038h, 0DEh, 0C2h ; = aesdec XMM0, XMM2
db 066h, 0Fh, 038h, 0DEh, 0C3h ; = aesdec XMM0, XMM3
db 066h, 0Fh, 038h, 0DEh, 0C4h ; = aesdec XMM0, XMM4
db 066h, 0Fh, 038h, 0DEh, 0C5h ; = aesdec XMM0, XMM5
db 066h, 0Fh, 038h, 0DEh, 0C6h ; = aesdec XMM0, XMM6
db 066h, 0Fh, 038h, 0DEh, 0C7h ; = aesdec XMM0, XMM7

db 066h, 0Fh, 038h, 0DEh, 05h ; = aesdec XMM0, xmmword ptr [RoundKey]
dd offset [RoundKey]

db 066h, 0Fh, 038h, 0DFh, 0C1h ; = aesdeclast XMM0, XMM1
db 066h, 0Fh, 038h, 0DFh, 0C2h ; = aesdeclast XMM0, XMM2
db 066h, 0Fh, 038h, 0DFh, 0C3h ; = aesdeclast XMM0, XMM3
db 066h, 0Fh, 038h, 0DFh, 0C4h ; = aesdeclast XMM0, XMM4
db 066h, 0Fh, 038h, 0DFh, 0C5h ; = aesdeclast XMM0, XMM5
db 066h, 0Fh, 038h, 0DFh, 0C6h ; = aesdeclast XMM0, XMM6
db 066h, 0Fh, 038h, 0DFh, 0C7h ; = aesdeclast XMM0, XMM7

db 066h, 0Fh, 038h, 0DFh, 05h  ; = aesdeclast XMM0, xmmword ptr [RoundKey]
dd offset [RoundKey]


Thanks for the help.

hutch--

I can't help you with code for the Watcom forks but if Japheth's JWASM will not do what you want, try Habran's version HJWASM, it should be a lot more up to date and as both support Microsoft COFF format objet modules, there is no reason not to uise more than one tool as a COFF linker will still link them successfully.

morgot

I found this topic in google, and decided to write.
hutch, you're right, UASM can create code using aes instructions.
Sorry for the bad English