Hello everybody,
uasm does not support extension instructions, for example sha256rnds2 (SHA - Instruction Set Extensions), with AES everything seems to be okay :lol:
Documentation package:
SHA
SHA Docs - https://software.intel.com/sites/default/files/article/402097/intel-sha-extensions-white-paper.pdf (https://software.intel.com/sites/default/files/article/402097/intel-sha-extensions-white-paper.pdf)
ASM Source (Intel® SHA Extensions Implementations) - https://software.intel.com/sites/default/files/article/402126/intel-sha-extensions_1.zip (https://software.intel.com/sites/default/files/article/402126/intel-sha-extensions_1.zip)
AES may be interesting for tests
Intel AESNI Sample Library - Assembler & C Source code (intel-aesni-sample-library-v1.2.zip) - https://web.archive.org/web/20170713153528/https://software.intel.com/sites/default/files/article/181731/intel-aesni-sample-library-v1.2.zip (https://web.archive.org/web/20170713153528/https://software.intel.com/sites/default/files/article/181731/intel-aesni-sample-library-v1.2.zip)
AES-NI white paper - Intel® Developer Zone https://software.intel.com/sites/default/files/article/165683/aes-wp-2012-09-22-v01.pdf (https://software.intel.com/sites/default/files/article/165683/aes-wp-2012-09-22-v01.pdf)
Intel® Architecture Instruction Set Extensions Programming Reference
https://web.archive.org/web/20130929035331if_/http://download-software.intel.com/sites/default/files/319433-015.pdf (https://web.archive.org/web/20130929035331if_/http://download-software.intel.com/sites/default/files/319433-015.pdf)
It would be cool to add SHA Instruction Extensions to the processing set. As you can see from the source code in assembler from Intel, yasm assembles these sets.
Best regards, LiaoMi
Will look at that ASAP, however, take in consideration the Australian Open ;)
done 8):
00007ff6c08d1807 0F 38 CC CA sha256msg1 xmm1, xmm2
00007ff6c08d180b 0F 38 CD CA sha256msg2 xmm1, xmm2
00007ff6c08d180f 0F 38 CC 09 sha256msg1 xmm1, xmmword ptr [rcx]
00007ff6c08d1813 0F 38 CD 09 sha256msg2 xmm1, xmmword ptr [rcx]
00007ff6c08d1817 0F 3A CC CA 0C sha1rnds4 xmm1, xmm2, 0xc
00007ff6c08d181c 0F 3A CC 09 0C sha1rnds4 xmm1, xmmword ptr [rcx], 0xc
00007ff6c08d1821 0F 38 C8 CA sha1nexte xmm1, xmm2
00007ff6c08d1825 0F 38 C8 09 sha1nexte xmm1, xmmword ptr [rcx]
00007ff6c08d1829 0F 38 C9 CA sha1msg1 xmm1, xmm2
00007ff6c08d182d 0F 38 C9 09 sha1msg1 xmm1, xmmword ptr [rcx]
00007ff6c08d1831 0F 38 CA CA sha1msg2 xmm1, xmm2
00007ff6c08d1835 0F 38 CA 09 sha1msg2 xmm1, xmmword ptr [rcx]
00007ff6c08d1839 0F 38 CB CA sha256rnds2 xmm1, xmm2
00007ff6c08d183d 0F 38 CB 09 sha256rnds2 xmm1, xmmword ptr [rcx]
will be in next release
Hi,
This will be included in 2.46.8 which should be up tonight or tomorrow along with the DEREF fix for com->Release() as well as support for typedef'ed PROC return types.
Hi, habran & johnsa,
thanks for the work, this is great news!
CLMUL instruction set is also not fully supported - https://en.wikipedia.org/wiki/CLMUL_instruction_set (https://en.wikipedia.org/wiki/CLMUL_instruction_set)
pclmulqdq xmm1, xmm2, 5
pclmulqdq xmm1, [rax], byte 5
pclmulqdq xmm1, dqword [rax], 5
vpclmulqdq xmm1, xmm2, 0x10
vpclmulqdq xmm1, dqword [rbx], 0x10
vpclmulqdq xmm0, xmm1, xmm2, 0x10
vpclmulqdq xmm0, xmm1, dqword [rbx], 0x10
pclmullqlqdq xmm1, xmm2
pclmullqlqdq xmm1, [rax]
pclmullqlqdq xmm1, dqword [rax]
vpclmullqlqdq xmm1, xmm2
vpclmullqlqdq xmm1, dqword[rbx]
vpclmullqlqdq xmm0, xmm1, xmm2
vpclmullqlqdq xmm0, xmm1, dqword[rbx]
pclmulhqlqdq xmm1, xmm2
pclmulhqlqdq xmm1, [rax]
pclmulhqlqdq xmm1, dqword [rax]
vpclmulhqlqdq xmm1, xmm2
vpclmulhqlqdq xmm1, dqword[rbx]
vpclmulhqlqdq xmm0, xmm1, xmm2
vpclmulhqlqdq xmm0, xmm1, dqword[rbx]
pclmullqhqdq xmm1, xmm2
pclmullqhqdq xmm1, [rax]
pclmullqhqdq xmm1, dqword [rax]
vpclmullqhqdq xmm1, xmm2
vpclmullqhqdq xmm1, dqword[rbx]
vpclmullqhqdq xmm0, xmm1, xmm2
vpclmullqhqdq xmm0, xmm1, dqword[rbx]
pclmulhqhqdq xmm1, xmm2
pclmulhqhqdq xmm1, [rax]
pclmulhqhqdq xmm1, dqword [rax]
vpclmulhqhqdq xmm1, xmm2
vpclmulhqhqdq xmm1, dqword[rbx]
vpclmulhqhqdq xmm0, xmm1, xmm2
vpclmulhqhqdq xmm0, xmm1, dqword[rbx]
RDSEED and RDRAND instruction set (Edited - it works completely, it was my mistake)
rdrand cx
rdrand ecx
rdrand rcx
Sample x86 asm code to check upon RDRAND instruction
; using NASM syntax
section .data
msg db "0x00000000",10
section .text
global _start
_start:
mov eax,1
cpuid
bt ecx,30
mov edi,1 ; exit code: failure
jnc .exit
; rdrand sets CF=0 if no random number
; was available. Intel documentation
; recommends 10 retries in a tight loop
mov ecx,11
.loop1:
sub ecx, 1
jz .exit ; exit code is set already
rdrand eax
jnc .loop1
; convert the number to ASCII
mov edi,msg+9
mov ecx,8
.loop2:
mov edx,eax
and edx,0Fh
; add 7 to nibbles of 0xA and above
; to align with ASCII code for 'A'
; ('A' - '0') - 10 = 7
xor r9d, r9d
lea r8d, [r9+7] ; r8=7
cmp dl,9
cmova r9,r8
add edx,r9d
add [rdi],dl
shr eax,4
sub edi, 1
sub ecx, 1
jnz .loop2
mov eax,1 ; SYS_WRITE
mov edi,eax ; stdout=SYS_WRITE=1
mov esi,msg
mov edx,11
syscall
xor edi,edi ; exit code zero: success
.exit:
mov eax,60 ; SYS_EXIT
syscall
Here is the document, can be useful as a reference - Intel Advanced Vector Extensions Programming Reference https://software.intel.com/file/36945 (https://software.intel.com/file/36945) (save file as pdf)
We'll check on the CLMUL completeness and add them too :)
I'm busy adding the regression tests for both sets now anyway.
This applies only to Intel processors, for amd, yasm supports other sets too, one of them The XOP (eXtended Operations) instruction set, FMA4 instruction set, TBM (Trailing Bit Manipulation). Trailing Bit Manipulation - for Intel is not supported, amd also want to refuse. FMA4 instruction set - for Intel is not supported, XOP (eXtended Operations) instruction set - for Intel is not supported.
QuoteIt is uncertain whether future Intel processors will support FMA4, due to Intel's announced change to FMA3.
this means that these sets dont make sense. Agner`s CPU blog - Stop the instruction set war - http://www.agner.org/optimize/blog/read.php?i=25 (http://www.agner.org/optimize/blog/read.php?i=25)
QuoteThe incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time.
Does anybody have any experience with an AMD processor? Can ml64.exe understand these XOP? FMA3 intel, FMA3 AMD, TBM sets?!
CLMUL instructions were implemented already:
00007ff770711807 66 0F 3A 44 08 05 pclmulqdq xmm1, xmmword ptr [rax], 0x5
00007ff77071180d 66 0F 3A 44 CA 05 pclmulqdq xmm1, xmm2, 0x5
00007ff770711813 C4 E3 69 44 CB 05 vpclmulqdq xmm1, xmm2, xmm3, 0x5
00007ff770711819 C4 E3 69 44 08 05 vpclmulqdq xmm1, xmm2, xmmword ptr [rax], 0x5
we have added Pseudo-Op, when using it we don't need imm, just like this:
PCLMULLQLQDQ xmm1, xmm2
PCLMULHQLQDQ xmm1, xmm2
PCLMULLQHQDQ xmm1, xmm2
PCLMULHQHQDQ xmm1, xmm2
VPCLMULLQLQDQ xmm1, xmm2,xmm3
VPCLMULHQLQDQ xmm1, xmm2,xmm3
VPCLMULLQHQDQ xmm1, xmm2,xmm3
VPCLMULHQHQDQ xmm1, xmm2,xmm3
SDE debugger doesn't recognise Pseudo-OP but I have MSVS 2013
maybe MSVS 2017 SDE does:
00007ff64a421807 66 0F 3A 44 CA 00 pclmulqdq xmm1, xmm2, 0x0
00007ff64a42180d 66 0F 3A 44 CA 01 pclmulqdq xmm1, xmm2, 0x1
00007ff64a421813 66 0F 3A 44 CA 10 pclmulqdq xmm1, xmm2, 0x10
00007ff64a421819 66 0F 3A 44 CA 11 pclmulqdq xmm1, xmm2, 0x11
00007ff64a42181f C4 E3 69 44 CB 00 vpclmulqdq xmm1, xmm2, xmm3, 0x0
00007ff64a421825 C4 E3 69 44 CB 01 vpclmulqdq xmm1, xmm2, xmm3, 0x1
00007ff64a42182b C4 E3 69 44 CB 10 vpclmulqdq xmm1, xmm2, xmm3, 0x10
00007ff64a421831 C4 E3 69 44 CB 11 vpclmulqdq xmm1, xmm2, xmm3, 0x11