UAsm, AsmC, MASM version 10 and lower:
int 3
movbe eax, [edi+5]
db 0Fh, 38h, 0F0h, 47h, 05h ; movbe eax, [edi+5]
mov eax, [edi+5]
Produces error A2008: syntax error : movbe
In case somebody wants to time it and has a Haswell+ CPU: exe + source attached. I can't give you results because my CPU is too old :(
Don't be surprised if it crashes with an exception - it just means your hardware sucks like mine :greensml:
Strangely enough, MOVBE is almost absent in the forum. It gets mentioned in the Comprehensive x86 instruction set for MSVS2010? (http://masm32.com/board/index.php?topic=219.msg948#msg948) thread, but that one is more about the Pope and his role in assembly programming 8)
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (SSE4
35 cycles for 100 * mov+bswap
20 cycles for 100 * movbe
33 cycles for 100 * mov+bswap
20 cycles for 100 * movbe
33 cycles for 100 * mov+bswap
22 cycles for 100 * movbe
33 cycles for 100 * mov+bswap
25 cycles for 100 * movbe
35 cycles for 100 * mov+bswap
31 cycles for 100 * movbe
9 bytes for mov+bswap
9 bytes for movbe
--- ok ---
Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz (SSE4)
21 cycles for 100 * mov+bswap
10 cycles for 100 * movbe
25 cycles for 100 * mov+bswap
10 cycles for 100 * movbe
23 cycles for 100 * mov+bswap
66 cycles for 100 * movbe
40 cycles for 100 * mov+bswap
24 cycles for 100 * movbe
76 cycles for 100 * mov+bswap
69 cycles for 100 * movbe
9 bytes for mov+bswap
9 bytes for movbe
--- ok ---
This works on my Haswell E/EP.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
USING r12
LOCAL var :QWORD
SaveRegs
mov r12, 1133557799BBDDFFh
conout hex$(r12),lf
movbe var, r12
conout hex$(var),lf
waitkey
RestoreRegs
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
OUTPUT
1133557799BBDDFF
FFDDBB9977553311
Press any key to continue...
Hi :P,
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
22 cycles for 100 * mov+bswap
11 cycles for 100 * movbe
23 cycles for 100 * mov+bswap
37 cycles for 100 * movbe
23 cycles for 100 * mov+bswap
8 cycles for 100 * movbe
21 cycles for 100 * mov+bswap
16 cycles for 100 * movbe
35 cycles for 100 * mov+bswap
18 cycles for 100 * movbe
9 bytes for mov+bswap
9 bytes for movbe
--- ok ---
MOVBE is now implemented in UASM and I hope John will include it in this release 8)
Now included, packages coming shortly :)
Quote from: habran on November 17, 2018, 08:00:26 PM
MOVBE is now implemented in UASM and I hope John will include it in this release 8)
A winning team :t
It can be quite useful, here is an example:
.data
szReplace db 'Replaced',0
.code
mov rdx,qword ptr szReplace
movbe rax,qword ptr szReplace
output:
RAX = 5265706C61636564
RDX = 646563616C706552
Yes indeed! If I ever encounter an application with an innermost time-critical loop where bswap is too slow, I will use it as a justification to get the ok for new hardware from my wife. Any ideas?
.if (hardware == old)
bswap eax
.else
movbe rax, qword ptr wife
.endif
;)
From memory there is an SSE4 instruction where you use a mask to determine the altered order of data and from memory it was a lot faster than BSWAP.
> I will use it as a justification to get the ok for new hardware from my wife. Any ideas?
Best of luck with that one, it sounds like an attempt to wring blood out of a stone.
Quote from: jj2007 on November 17, 2018, 10:00:25 PM
Any ideas?
Yes, tell the wife that you need this:
https://www.pccomponentes.com/pccom-workstation-iii-intel-i7-7800x-32gb-ssd500-m2-2tb-gtx1060
To test AVX-512 instructions as well.
Intel(R) Atom(TM) CPU Z3735F @ 1.33GHz (SSE4)
95 cycles for 100 * mov+bswap
85 cycles for 100 * movbe
98 cycles for 100 * mov+bswap
85 cycles for 100 * movbe
95 cycles for 100 * mov+bswap
88 cycles for 100 * movbe
98 cycles for 100 * mov+bswap
86 cycles for 100 * movbe
116 cycles for 100 * mov+bswap
83 cycles for 100 * movbe
9 bytes for mov+bswap
9 bytes for movbe
--- ok ---
Results on my laptop with a shitty silvermont CPU (I think those were the first with MOVBE apart from bonnell