News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Intel SHA - Instruction Set Extensions

Started by LiaoMi, January 21, 2018, 03:33:36 AM

Previous topic - Next topic

LiaoMi

Hello everybody,

uasm does not support extension instructions, for example sha256rnds2 (SHA - Instruction Set Extensions), with AES everything seems to be okay  :lol:

Documentation package:

SHA

SHA Docs - https://software.intel.com/sites/default/files/article/402097/intel-sha-extensions-white-paper.pdf
ASM Source (Intel® SHA Extensions Implementations) - https://software.intel.com/sites/default/files/article/402126/intel-sha-extensions_1.zip

AES may be interesting for tests

Intel AESNI Sample Library - Assembler & C Source code (intel-aesni-sample-library-v1.2.zip) - https://web.archive.org/web/20170713153528/https://software.intel.com/sites/default/files/article/181731/intel-aesni-sample-library-v1.2.zip
AES-NI white paper - Intel® Developer Zone https://software.intel.com/sites/default/files/article/165683/aes-wp-2012-09-22-v01.pdf

Intel® Architecture Instruction Set Extensions Programming Reference
https://web.archive.org/web/20130929035331if_/http://download-software.intel.com/sites/default/files/319433-015.pdf

It would be cool to add SHA Instruction Extensions to the processing set. As you can see from the source code in assembler from Intel, yasm assembles these sets.

Best regards, LiaoMi

habran

Will look at that ASAP, however, take in consideration the Australian Open ;)
Cod-Father

habran

done 8):

00007ff6c08d1807 0F 38 CC CA                      sha256msg1 xmm1, xmm2 
00007ff6c08d180b 0F 38 CD CA                      sha256msg2 xmm1, xmm2 
00007ff6c08d180f 0F 38 CC 09                      sha256msg1 xmm1, xmmword ptr [rcx] 
00007ff6c08d1813 0F 38 CD 09                      sha256msg2 xmm1, xmmword ptr [rcx] 
00007ff6c08d1817 0F 3A CC CA 0C                   sha1rnds4 xmm1, xmm2, 0xc 
00007ff6c08d181c 0F 3A CC 09 0C                   sha1rnds4 xmm1, xmmword ptr [rcx], 0xc 
00007ff6c08d1821 0F 38 C8 CA                      sha1nexte xmm1, xmm2 
00007ff6c08d1825 0F 38 C8 09                      sha1nexte xmm1, xmmword ptr [rcx] 
00007ff6c08d1829 0F 38 C9 CA                      sha1msg1 xmm1, xmm2 
00007ff6c08d182d 0F 38 C9 09                      sha1msg1 xmm1, xmmword ptr [rcx] 
00007ff6c08d1831 0F 38 CA CA                      sha1msg2 xmm1, xmm2 
00007ff6c08d1835 0F 38 CA 09                      sha1msg2 xmm1, xmmword ptr [rcx] 
00007ff6c08d1839 0F 38 CB CA                      sha256rnds2 xmm1, xmm2 
00007ff6c08d183d 0F 38 CB 09                      sha256rnds2 xmm1, xmmword ptr [rcx]

will be in next release
Cod-Father

johnsa

Hi,

This will be included in 2.46.8 which should be up tonight or tomorrow along with the DEREF fix for com->Release() as well as support for typedef'ed PROC return types.

LiaoMi

Hi, habran & johnsa,

thanks for the work, this is great news!

CLMUL instruction set is also not fully supported - https://en.wikipedia.org/wiki/CLMUL_instruction_set

pclmulqdq xmm1, xmm2, 5
pclmulqdq xmm1, [rax], byte 5
pclmulqdq xmm1, dqword [rax], 5
vpclmulqdq xmm1, xmm2, 0x10
vpclmulqdq xmm1, dqword [rbx], 0x10
vpclmulqdq xmm0, xmm1, xmm2, 0x10
vpclmulqdq xmm0, xmm1, dqword [rbx], 0x10

pclmullqlqdq xmm1, xmm2
pclmullqlqdq xmm1, [rax]
pclmullqlqdq xmm1, dqword [rax]
vpclmullqlqdq xmm1, xmm2
vpclmullqlqdq xmm1, dqword[rbx]
vpclmullqlqdq xmm0, xmm1, xmm2
vpclmullqlqdq xmm0, xmm1, dqword[rbx]

pclmulhqlqdq xmm1, xmm2
pclmulhqlqdq xmm1, [rax]
pclmulhqlqdq xmm1, dqword [rax]
vpclmulhqlqdq xmm1, xmm2
vpclmulhqlqdq xmm1, dqword[rbx]
vpclmulhqlqdq xmm0, xmm1, xmm2
vpclmulhqlqdq xmm0, xmm1, dqword[rbx]

pclmullqhqdq xmm1, xmm2
pclmullqhqdq xmm1, [rax]
pclmullqhqdq xmm1, dqword [rax]
vpclmullqhqdq xmm1, xmm2
vpclmullqhqdq xmm1, dqword[rbx]
vpclmullqhqdq xmm0, xmm1, xmm2
vpclmullqhqdq xmm0, xmm1, dqword[rbx]

pclmulhqhqdq xmm1, xmm2
pclmulhqhqdq xmm1, [rax]
pclmulhqhqdq xmm1, dqword [rax]
vpclmulhqhqdq xmm1, xmm2
vpclmulhqhqdq xmm1, dqword[rbx]
vpclmulhqhqdq xmm0, xmm1, xmm2
vpclmulhqhqdq xmm0, xmm1, dqword[rbx]


RDSEED and RDRAND instruction set (Edited - it works completely, it was my mistake)

rdrand cx
rdrand ecx
rdrand rcx


Sample x86 asm code to check upon RDRAND instruction

; using NASM syntax

section .data
msg db "0x00000000",10

section .text
global _start
_start:
mov eax,1
cpuid
bt ecx,30
mov edi,1 ; exit code: failure
jnc .exit

; rdrand sets CF=0 if no random number
; was available. Intel documentation
; recommends 10 retries in a tight loop
mov ecx,11
.loop1:
sub ecx, 1
jz .exit ; exit code is set already
rdrand eax
jnc .loop1

; convert the number to ASCII
mov edi,msg+9
mov ecx,8
.loop2:
mov edx,eax
and edx,0Fh
; add 7 to nibbles of 0xA and above
; to align with ASCII code for 'A'
; ('A' - '0') - 10 = 7
xor r9d, r9d
        lea r8d, [r9+7] ; r8=7
cmp dl,9
cmova r9,r8
add edx,r9d
add [rdi],dl
shr eax,4
sub edi, 1
sub ecx, 1
        jnz .loop2

mov eax,1 ; SYS_WRITE
mov edi,eax ; stdout=SYS_WRITE=1
mov esi,msg
mov edx,11
syscall

xor edi,edi ; exit code zero: success
.exit:
mov eax,60 ; SYS_EXIT
syscall


Here is the document, can be useful as a reference - Intel Advanced Vector Extensions Programming Reference https://software.intel.com/file/36945 (save file as pdf)

johnsa

We'll check on the CLMUL completeness and add them too :)

I'm busy adding the regression tests for both sets now anyway.

LiaoMi

This applies only to Intel processors, for amd, yasm supports other sets too, one of them The XOP (eXtended Operations) instruction set, FMA4 instruction set, TBM (Trailing Bit Manipulation). Trailing Bit Manipulation - for Intel is not supported, amd also want to refuse. FMA4 instruction set - for Intel is not supported, XOP (eXtended Operations) instruction set - for Intel is not supported.

QuoteIt is uncertain whether future Intel processors will support FMA4, due to Intel's announced change to FMA3.
this means that these sets dont make sense. Agner`s CPU blog - Stop the instruction set war - http://www.agner.org/optimize/blog/read.php?i=25

QuoteThe incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time.

Does anybody have any experience with an AMD processor? Can ml64.exe understand these XOP? FMA3 intel, FMA3 AMD, TBM sets?!

habran

CLMUL instructions were implemented already:

00007ff770711807 66 0F 3A 44 08 05                pclmulqdq xmm1, xmmword ptr [rax], 0x5 
00007ff77071180d 66 0F 3A 44 CA 05                pclmulqdq xmm1, xmm2, 0x5 
00007ff770711813 C4 E3 69 44 CB 05                vpclmulqdq xmm1, xmm2, xmm3, 0x5 
00007ff770711819 C4 E3 69 44 08 05                vpclmulqdq xmm1, xmm2, xmmword ptr [rax], 0x5
Cod-Father

habran

we have added Pseudo-Op, when using it we don't need imm, just like this:

PCLMULLQLQDQ xmm1, xmm2
PCLMULHQLQDQ xmm1, xmm2
PCLMULLQHQDQ xmm1, xmm2
PCLMULHQHQDQ xmm1, xmm2
VPCLMULLQLQDQ xmm1, xmm2,xmm3
VPCLMULHQLQDQ xmm1, xmm2,xmm3
VPCLMULLQHQDQ xmm1, xmm2,xmm3
VPCLMULHQHQDQ xmm1, xmm2,xmm3

SDE debugger doesn't recognise Pseudo-OP but I have MSVS 2013
maybe MSVS 2017 SDE does:

00007ff64a421807 66 0F 3A 44 CA 00                pclmulqdq            xmm1, xmm2, 0x0 
00007ff64a42180d 66 0F 3A 44 CA 01                pclmulqdq            xmm1, xmm2, 0x1 
00007ff64a421813 66 0F 3A 44 CA 10                pclmulqdq            xmm1, xmm2, 0x10 
00007ff64a421819 66 0F 3A 44 CA 11                pclmulqdq            xmm1, xmm2, 0x11 
00007ff64a42181f C4 E3 69 44 CB 00                vpclmulqdq            xmm1, xmm2, xmm3, 0x0 
00007ff64a421825 C4 E3 69 44 CB 01                vpclmulqdq            xmm1, xmm2, xmm3, 0x1 
00007ff64a42182b C4 E3 69 44 CB 10                vpclmulqdq            xmm1, xmm2, xmm3, 0x10 
00007ff64a421831 C4 E3 69 44 CB 11                vpclmulqdq            xmm1, xmm2, xmm3, 0x11
Cod-Father