News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

"Hello masm32", not a BOT, new member

Started by LordAdef, January 22, 2017, 09:42:24 AM

Previous topic - Next topic

mineiro

hello LordAdef;
"little endian" and big endian is about your question.

mov [eax], bx                        ; send the 2 bytes (grouped) to dest array

on line above, if bx == 0102h , when you store this value from register to memory their order changes because you're dealing with word, so will be stored on memory as 0201h.
Just check this, after store a word,dword, qword on memory, get only one byte from that address instead of more than 1 byte and you will see the point.
mov [eax],12345678h   ;a double word, 4 bytes group stored on address pointed by eax register
So, on memory that will look like
[rax] pointed address contents == 78563412h

offtopic: Oh yes LordAdef, I'm from Minas Gerais, yes, this slogan is about Rauzlito. Good to see brothers here trying the Latim language of computers.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

FORTRANS

Hi,

   X86 is little endian, which means when you store a register to
memory, its apparent byte order becomes reversed.  For the BX
register, you can use the old DOS DEBUG to see what/how that
happens.  Or there are endless tutorials and discussions on endian-
ness fun out and about the web.

HTH,

Steve N.

P.S.  While typing I was scooped (someone beat me to answer.).

SRN

hutch--

A couple of things here, Steve is right about the byte order of a 32 bit register if you think of 4 bytes labeled 0123, in memory they are stored as 3210 and this is a characteristic of x86 hardware. It catches people who are learning because byte data like text is stored left to right but numbers are stored in reverse order. In 32 bit you can access the two lowest bytes with AL and AH but for a long time Intel have advised against using the high byte and it 64 bit you cannot access it at all.

If you can manage it, do all of your BYTE register reads and writes in the low byte register AL/BL/CL etc .... It takes a bit more organisation so you don't run out of registers but a lesson I have learnt writing 64 bit algos, the ones I wrote properly in win32 easily converted to Win64 where the odd one or two were pigs that had to be rewritten because you could not access the high byte directly. You can still indirectly access any of the last 3 bytes of a DWORD by using rotates or shifts but it is slower as shifts and rotates are not fast instructions.

LordAdef

Bollocks!!!

And I knew about Endian order!! That proves the fact the one can only learn Assembly by practicing....  I was so focused on the algorithm that missed that...

Thanks Mineiro, Steve and Hutch!

Mineiro, my wife asked me why I am learning Assembly. I said I want to become a Painter, not a Photoshop editor (no offense intended). Well, that's how I feel about asm.

Hutch, any specific reason why intel suggested not using the high byte?

jj2007

Quote from: hutch-- on February 15, 2017, 02:01:22 AMIn 32 bit you can access the two lowest bytes with AL and AH but for a long time Intel have advised against using the high byte and it 64 bit you cannot access it at all.

Well, ah is not completely inaccessible (but you probably meant something else):

include \Masm32\MasmBasic\Res\JBasic.inc      ; part of MasmBasic

Init            ; OPT_64 1      ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
  mov ah, 123
  ; movzx rax, ah      ; not possible
  movzx eax, ah      ; workaround; same result as movzx rax, ah but shorter
  Print Str$("If ah is 123, then rax is now %lli\n\n", rax)
  mov rax, 1234567890123456789
  Inkey Str$("rax can be a really big number: %lli\n", rax)
EndOfCode


Output:
This code was assembled with HJWasm32 in 64-bit format
If ah is 123, then rax is now 123

rax can be a really big number: 1234567890123456789

mineiro

Quote from: LordAdef on February 15, 2017, 03:47:33 AM
Mineiro, my wife asked me why I am learning Assembly. I said I want to become a Painter, not a Photoshop editor (no offense intended). Well, that's how I feel about asm.
:eusa_clap:
I share the same opinion, persons that learn assembly are persons that don't simply accept things but like to understand the magic behind curtains.
You said about Photoshop, well, we can use photoshop as being a hexadecimal editor, but instead of see hexadecimal numbers we see colors, so, on theory we can program in assembly language by using Photoshop, it's hard I confess, but not impossible. Inverse can be done too, we can use an assembler to create a .gif,.bmp,..., it's hard but not impossible.
You're being a musician the same thing, but instead of see hexadecimal numbers on Audacity per example we see sinoidal waves.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

hutch--

> Hutch, any specific reason why intel suggested not using the high byte?

The reasoning from Intel at the time was it was a slower operation because of how the register was loaded and I think it goes back to the PIV era. You have had 2 major series of Intel hardware since, the Core2 series and the i3/5/7 series which may vary or are no longer bothered by it but for 64 bit operations, the high byte register is not available. JJ has shown how to access AH with a 32 bit operation but for a 64 bit operation, there is no opcode that will do it.

jj2007

Quote from: hutch-- on February 15, 2017, 10:58:58 AMit was a slower operation because of how the register was loaded and I think it goes back to the PIV era.

Here is a little testbed:
align_64
TestA_s:
NameA equ mov al ; assign a descriptive name here
TestA proc
  mov ebx, AlgoLoops-1 ; loop e.g. 100x
  align 4
  .Repeat
mov al, byte ptr somestring
mov cl, al
inc al
movzx eax, al
dec ebx
  .Until Sign?
  ret
TestA endp
TestA_endp:

align_64
TestB_s:
NameB equ mov ah ; assign a descriptive name here
TestB proc
  mov ebx, AlgoLoops-1 ; loop e.g. 100x
  align 4
  .Repeat
mov ah, byte ptr somestring
mov ch, ah
inc ah
movzx eax, ah
dec ebx
  .Until Sign?
  ret
TestB endp
TestB_endp:


Results:Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

127     cycles for 100 * mov al
53      cycles for 100 * mov ah

130     cycles for 100 * mov al
55      cycles for 100 * mov ah

126     cycles for 100 * mov al
52      cycles for 100 * mov ah

129     cycles for 100 * mov al
55      cycles for 100 * mov ah

127     cycles for 100 * mov al
55      cycles for 100 * mov ah

12      bytes for mov al
13      bytes for mov ah


The ah stuff is definitely one byte longer.

hutch--

I tried to access the source code but it is unreadable RTF. What I would test is turning the two loops around because at the moment the code you posted is indicating that AH is faster than AL which does not make sense.

Posting examples in an unreadable format makes testing your algorithms unviable which is unfortunate because it renders them useless.

jj2007

#69
Quote from: hutch-- on February 15, 2017, 03:05:56 PM
I tried to access the source code but it is unreadable RTF.

RTF has been readable for almost 30 years now. Wordpad, for example, can read it; also MS Word, RichMasm, LibreOffice, ...

QuoteWhat I would test is turning the two loops around because at the moment the code you posted is indicating that AH is faster than AL which does not make sense.

Given that there is a REPEAT 5 ... ENDM around the code examples, there is obviously no need to exchange the order of the loops.

Anyway, for those who are not able to read RTF, attached a plain text version with loops "turned around". I agree, of course, that it "does not make sense" that ah is faster than al - maybe you can code something where it is the other round. Btw there is a switch in line 3 of the source: useMB=0 - the second attachment contains the exe without any trace of MasmBasic (4096 bytes only), maybe the AH register behaves better without the influence of that library.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

57      cycles for 100 * mov ah
129     cycles for 100 * mov al

56      cycles for 100 * mov ah
122     cycles for 100 * mov al

56      cycles for 100 * mov ah
129     cycles for 100 * mov al

56      cycles for 100 * mov ah
129     cycles for 100 * mov al

56      cycles for 100 * mov ah
129     cycles for 100 * mov al

13      bytes for mov ah
12      bytes for mov al

hutch--

 :biggrin:

> RTF has been readable for almost 30 years now. Wordpad, for example, can read it; also MS Word, RichMasm, LibreOffice, ...

Trouble is that assemblers and compiler can't read it. Locking in a deviant code format to an exclusive editor makes the files unbuildable with anything else. Unless M$ have updated Word and Wordpad recently, they will not assemble MASM source code. Without a viable method to test the algos you post, there is no way of knowing what they do or how they are written.

With MASM it can be built with a batch file that does not require an editor at all and that can be tested by anyone who has a normal ascii text editor, Notepad and a whole host of others.

jj2007

Quote from: hutch-- on February 16, 2017, 02:59:02 AMthere is no way of knowing what they do or how they are written.

You could open rich text (*.asc) files in Wordpad to see how they are written. If you don't trust my executables, you can press Ctrl A, Ctrl C, then switch to a poor text editor of your choice and build it there.

Never mind, from now on I'll try to add the poor text versions, too.

LordAdef

curious to know what's happening with al and ah in this code. it's an odd result indeed

hutch--

Here is a simple benchmark testing the load time of AL and AH. Done on my 3.3 gig 6 core HASWELL.

This is the result.

688 load AL
1015 load AH
703 load AL
985 load AH
718 load AL
1000 load AH
735 load AL
984 load AH
703 load AL
1032 load AH
671 load AL
969 load AH
735 load AL
984 load AH
687 load AL
1000 load AH
Press any key to continue ...


This is the code.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    .data?
      value dd ?

    .data
      item dd 0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    push ebx
    push esi
    push edi

    mov edi, 8

  lpstart:

  ; -----------------------------------------------------------

    mov esi, 1024*1024*1024     ; a power of 2, billion.
    mov dl, 0

    invoke GetTickCount
    push eax

  @@:
    mov al, dl                  ; load AL
    add dl, 1
    cmp dl, 255
    jne nxt
    mov dl, 0
  nxt:
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print str$(eax)," load AL",13,10

  ; -----------------------------------------------------------

    mov esi, 1024*1024*1024     ; a power of 2 billion.
    mov dl, 0

    invoke GetTickCount
    push eax

  @@:
    mov ah, dl                  ; load AH
    add dl, 1
    cmp dl, 255
    jne nxt1
    mov dl, 0
  nxt1:
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx

    print str$(eax)," load AH",13,10

  ; -----------------------------------------------------------

    sub edi, 1
    jnz lpstart

    pop edi
    pop esi
    pop ebx

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

jj2007

Interesting :t

I found my example a bit more relevant for practical purposes, but no problem, you found a case where mov ah is slow, congrats :icon14:

Btw how do the timings change with an align 4 before the two loops?