News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

MOVBE missing

Started by jj2007, November 17, 2018, 02:13:45 PM

Previous topic - Next topic

jj2007

UAsm, AsmC, MASM version 10 and lower:
int 3
movbe eax, [edi+5]
db 0Fh, 38h, 0F0h, 47h, 05h     ; movbe eax, [edi+5]
mov eax, [edi+5]


Produces error A2008: syntax error : movbe

In case somebody wants to time it and has a Haswell+ CPU: exe + source attached. I can't give you results because my CPU is too old :(
Don't be surprised if it crashes with an exception - it just means your hardware sucks like mine :greensml:

Strangely enough, MOVBE is almost absent in the forum. It gets mentioned in the Comprehensive x86 instruction set for MSVS2010? thread, but that one is more about the Pope and his role in assembly programming 8)

fearless

Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (SSE4

35      cycles for 100 * mov+bswap
20      cycles for 100 * movbe

33      cycles for 100 * mov+bswap
20      cycles for 100 * movbe

33      cycles for 100 * mov+bswap
22      cycles for 100 * movbe

33      cycles for 100 * mov+bswap
25      cycles for 100 * movbe

35      cycles for 100 * mov+bswap
31      cycles for 100 * movbe

9       bytes for mov+bswap
9       bytes for movbe


--- ok ---

mabdelouahab

Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz (SSE4)

21      cycles for 100 * mov+bswap
10      cycles for 100 * movbe

25      cycles for 100 * mov+bswap
10      cycles for 100 * movbe

23      cycles for 100 * mov+bswap
66      cycles for 100 * movbe

40      cycles for 100 * mov+bswap
24      cycles for 100 * movbe

76      cycles for 100 * mov+bswap
69      cycles for 100 * movbe

9       bytes for mov+bswap
9       bytes for movbe


--- ok ---

hutch--

This works on my Haswell E/EP.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    USING r12
    LOCAL var   :QWORD

    SaveRegs

    mov r12, 1133557799BBDDFFh

    conout hex$(r12),lf

    movbe var, r12

    conout hex$(var),lf

    waitkey
    RestoreRegs
    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end


OUTPUT


1133557799BBDDFF
FFDDBB9977553311
Press any key to continue...

LiaoMi

Hi  :P,

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)

22      cycles for 100 * mov+bswap
11      cycles for 100 * movbe

23      cycles for 100 * mov+bswap
37      cycles for 100 * movbe

23      cycles for 100 * mov+bswap
8       cycles for 100 * movbe

21      cycles for 100 * mov+bswap
16      cycles for 100 * movbe

35      cycles for 100 * mov+bswap
18      cycles for 100 * movbe

9       bytes for mov+bswap
9       bytes for movbe


--- ok ---

habran

MOVBE is now implemented in UASM and I hope John will include it in this release 8)
Cod-Father

johnsa

Now included, packages coming shortly :)

jj2007

Quote from: habran on November 17, 2018, 08:00:26 PM
MOVBE is now implemented in UASM and I hope John will include it in this release 8)

A winning team :t

habran

It can be quite useful, here is an example:

.data
szReplace db 'Replaced',0

.code

   mov          rdx,qword ptr szReplace
   movbe rax,qword ptr szReplace
output:

RAX = 5265706C61636564
RDX = 646563616C706552

Cod-Father

jj2007

Yes indeed! If I ever encounter an application with an innermost time-critical loop where bswap is too slow, I will use it as a justification to get the ok for new hardware from my wife. Any ideas?

habran

 .if (hardware == old)
    bswap eax
.else
   movbe rax, qword ptr wife
.endif
;)
Cod-Father

hutch--

From memory there is an SSE4 instruction where you use a mask to determine the altered order of data and from memory it was a lot faster than BSWAP.

> I will use it as a justification to get the ok for new hardware from my wife. Any ideas?

Best of luck with that one, it sounds like an attempt to wring blood out of a stone.

aw27

Quote from: jj2007 on November 17, 2018, 10:00:25 PM
Any ideas?
Yes, tell the wife that you need this:
https://www.pccomponentes.com/pccom-workstation-iii-intel-i7-7800x-32gb-ssd500-m2-2tb-gtx1060
To test AVX-512 instructions as well.

GabrielRavier


Intel(R) Atom(TM) CPU  Z3735F @ 1.33GHz (SSE4)

95      cycles for 100 * mov+bswap
85      cycles for 100 * movbe

98      cycles for 100 * mov+bswap
85      cycles for 100 * movbe

95      cycles for 100 * mov+bswap
88      cycles for 100 * movbe

98      cycles for 100 * mov+bswap
86      cycles for 100 * movbe

116     cycles for 100 * mov+bswap
83      cycles for 100 * movbe

9       bytes for mov+bswap
9       bytes for movbe


--- ok ---



Results on my laptop with a shitty silvermont CPU (I think those were the first with MOVBE apart from bonnell
My github profile
https://github.com/GabrielRavier