"Hello masm32", not a BOT, new member

mineiro · February 15, 2017, 01:01:24 AM

hello LordAdef;
"little endian" and big endian is about your question.

mov [eax], bx ; send the 2 bytes (grouped) to dest array

on line above, if bx == 0102h , when you store this value from register to memory their order changes because you're dealing with word, so will be stored on memory as 0201h.
Just check this, after store a word,dword, qword on memory, get only one byte from that address instead of more than 1 byte and you will see the point.
mov [eax],12345678h ;a double word, 4 bytes group stored on address pointed by eax register
So, on memory that will look like
[rax] pointed address contents == 78563412h

offtopic: Oh yes LordAdef, I'm from Minas Gerais, yes, this slogan is about Rauzlito. Good to see brothers here trying the Latim language of computers.

FORTRANS · February 15, 2017, 01:07:57 AM

Hi,

X86 is little endian, which means when you store a register to
memory, its apparent byte order becomes reversed. For the BX
register, you can use the old DOS DEBUG to see what/how that
happens. Or there are endless tutorials and discussions on endian-
ness fun out and about the web.

HTH,

Steve N.

P.S. While typing I was scooped (someone beat me to answer.).

SRN

hutch-- · February 15, 2017, 02:01:22 AM

A couple of things here, Steve is right about the byte order of a 32 bit register if you think of 4 bytes labeled 0123, in memory they are stored as 3210 and this is a characteristic of x86 hardware. It catches people who are learning because byte data like text is stored left to right but numbers are stored in reverse order. In 32 bit you can access the two lowest bytes with AL and AH but for a long time Intel have advised against using the high byte and it 64 bit you cannot access it at all.

If you can manage it, do all of your BYTE register reads and writes in the low byte register AL/BL/CL etc .... It takes a bit more organisation so you don't run out of registers but a lesson I have learnt writing 64 bit algos, the ones I wrote properly in win32 easily converted to Win64 where the odd one or two were pigs that had to be rewritten because you could not access the high byte directly. You can still indirectly access any of the last 3 bytes of a DWORD by using rotates or shifts but it is slower as shifts and rotates are not fast instructions.

LordAdef · February 15, 2017, 03:47:33 AM

Bollocks!!!

And I knew about Endian order!! That proves the fact the one can only learn Assembly by practicing.... I was so focused on the algorithm that missed that...

Thanks Mineiro, Steve and Hutch!

Mineiro, my wife asked me why I am learning Assembly. I said I want to become a Painter, not a Photoshop editor (no offense intended). Well, that's how I feel about asm.

Hutch, any specific reason why intel suggested not using the high byte?

jj2007 · February 15, 2017, 04:01:58 AM

Quote from: hutch-- on February 15, 2017, 02:01:22 AMIn 32 bit you can access the two lowest bytes with AL and AH but for a long time Intel have advised against using the high byte and it 64 bit you cannot access it at all.

Well, ah is not completely inaccessible (but you probably meant something else):

include \Masm32\MasmBasic\Res\JBasic.inc ; part of MasmBasic

Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
mov ah, 123
; movzx rax, ah ; not possible
movzx eax, ah ; workaround; same result as movzx rax, ah but shorter
Print Str$("If ah is 123, then rax is now %lli\n\n", rax)
mov rax, 1234567890123456789
Inkey Str$("rax can be a really big number: %lli\n", rax)
EndOfCode

Output:

Code Select

This code was assembled with HJWasm32 in 64-bit format
If ah is 123, then rax is now 123

rax can be a really big number: 1234567890123456789

mineiro · February 15, 2017, 05:50:39 AM

Quote from: LordAdef on February 15, 2017, 03:47:33 AM
Mineiro, my wife asked me why I am learning Assembly. I said I want to become a Painter, not a Photoshop editor (no offense intended). Well, that's how I feel about asm.

I share the same opinion, persons that learn assembly are persons that don't simply accept things but like to understand the magic behind curtains.
You said about Photoshop, well, we can use photoshop as being a hexadecimal editor, but instead of see hexadecimal numbers we see colors, so, on theory we can program in assembly language by using Photoshop, it's hard I confess, but not impossible. Inverse can be done too, we can use an assembler to create a .gif,.bmp,..., it's hard but not impossible.
You're being a musician the same thing, but instead of see hexadecimal numbers on Audacity per example we see sinoidal waves.

hutch-- · February 15, 2017, 10:58:58 AM

> Hutch, any specific reason why intel suggested not using the high byte?

The reasoning from Intel at the time was it was a slower operation because of how the register was loaded and I think it goes back to the PIV era. You have had 2 major series of Intel hardware since, the Core2 series and the i3/5/7 series which may vary or are no longer bothered by it but for 64 bit operations, the high byte register is not available. JJ has shown how to access AH with a 32 bit operation but for a 64 bit operation, there is no opcode that will do it.

jj2007 · February 15, 2017, 12:37:35 PM

Quote from: hutch-- on February 15, 2017, 10:58:58 AMit was a slower operation because of how the register was loaded and I think it goes back to the PIV era.

Here is a little testbed:

Code Select

align_64
TestA_s:
NameA equ mov al	; assign a descriptive name here
TestA proc
  mov ebx, AlgoLoops-1	; loop e.g. 100x
  align 4
  .Repeat
	mov al, byte ptr somestring
	mov cl, al
	inc al
	movzx eax, al
	dec ebx
  .Until Sign?
  ret
TestA endp
TestA_endp:

align_64
TestB_s:
NameB equ mov ah	; assign a descriptive name here
TestB proc
  mov ebx, AlgoLoops-1	; loop e.g. 100x
  align 4
  .Repeat
	mov ah, byte ptr somestring
	mov ch, ah
	inc ah
	movzx eax, ah
	dec ebx
  .Until Sign?
  ret
TestB endp
TestB_endp:

Results:

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

127     cycles for 100 * mov al
53      cycles for 100 * mov ah

130     cycles for 100 * mov al
55      cycles for 100 * mov ah

126     cycles for 100 * mov al
52      cycles for 100 * mov ah

129     cycles for 100 * mov al
55      cycles for 100 * mov ah

127     cycles for 100 * mov al
55      cycles for 100 * mov ah

12      bytes for mov al
13      bytes for mov ah

The ah stuff is definitely one byte longer.

hutch-- · February 15, 2017, 03:05:56 PM

I tried to access the source code but it is unreadable RTF. What I would test is turning the two loops around because at the moment the code you posted is indicating that AH is faster than AL which does not make sense.

Posting examples in an unreadable format makes testing your algorithms unviable which is unfortunate because it renders them useless.

jj2007 · February 15, 2017, 06:03:14 PM

Quote from: hutch-- on February 15, 2017, 03:05:56 PM
I tried to access the source code but it is unreadable RTF.

RTF has been readable for almost 30 years now. Wordpad, for example, can read it; also MS Word, RichMasm, LibreOffice, ...

QuoteWhat I would test is turning the two loops around because at the moment the code you posted is indicating that AH is faster than AL which does not make sense.

Given that there is a REPEAT 5 ... ENDM around the code examples, there is obviously no need to exchange the order of the loops.

Anyway, for those who are not able to read RTF, attached a plain text version with loops "turned around". I agree, of course, that it "does not make sense" that ah is faster than al - maybe you can code something where it is the other round. Btw there is a switch in line 3 of the source: useMB=0 - the second attachment contains the exe without any trace of MasmBasic (4096 bytes only), maybe the AH register behaves better without the influence of that library.

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

57      cycles for 100 * mov ah
129     cycles for 100 * mov al

56      cycles for 100 * mov ah
122     cycles for 100 * mov al

56      cycles for 100 * mov ah
129     cycles for 100 * mov al

56      cycles for 100 * mov ah
129     cycles for 100 * mov al

56      cycles for 100 * mov ah
129     cycles for 100 * mov al

13      bytes for mov ah
12      bytes for mov al

hutch-- · February 16, 2017, 02:59:02 AM

> RTF has been readable for almost 30 years now. Wordpad, for example, can read it; also MS Word, RichMasm, LibreOffice, ...

Trouble is that assemblers and compiler can't read it. Locking in a deviant code format to an exclusive editor makes the files unbuildable with anything else. Unless M$ have updated Word and Wordpad recently, they will not assemble MASM source code. Without a viable method to test the algos you post, there is no way of knowing what they do or how they are written.

With MASM it can be built with a batch file that does not require an editor at all and that can be tested by anyone who has a normal ascii text editor, Notepad and a whole host of others.

jj2007 · February 16, 2017, 03:36:50 AM

Quote from: hutch-- on February 16, 2017, 02:59:02 AMthere is no way of knowing what they do or how they are written.

You could open rich text (*.asc) files in Wordpad to see how they are written. If you don't trust my executables, you can press Ctrl A, Ctrl C, then switch to a poor text editor of your choice and build it there.

Never mind, from now on I'll try to add the poor text versions, too.

LordAdef · February 16, 2017, 05:28:53 AM

curious to know what's happening with al and ah in this code. it's an odd result indeed

hutch-- · February 16, 2017, 05:58:03 AM

Here is a simple benchmark testing the load time of AL and AH. Done on my 3.3 gig 6 core HASWELL.

This is the result.

688 load AL
1015 load AH
703 load AL
985 load AH
718 load AL
1000 load AH
735 load AL
984 load AH
703 load AL
1032 load AH
671 load AL
969 load AH
735 load AL
984 load AH
687 load AL
1000 load AH
Press any key to continue ...

This is the code.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *

.data?
value dd ?

.data
item dd 0

.code

start:

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

call main
inkey
exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

push ebx
push esi
push edi

mov edi, 8

lpstart:

; -----------------------------------------------------------

mov esi, 1024*1024*1024 ; a power of 2, billion.
mov dl, 0

invoke GetTickCount
push eax

@@:
mov al, dl ; load AL
add dl, 1
cmp dl, 255
jne nxt
mov dl, 0
nxt:
sub esi, 1
jnz @B

invoke GetTickCount
pop ecx
sub eax, ecx

print str$(eax)," load AL",13,10

; -----------------------------------------------------------

mov esi, 1024*1024*1024 ; a power of 2 billion.
mov dl, 0

invoke GetTickCount
push eax

@@:
mov ah, dl ; load AH
add dl, 1
cmp dl, 255
jne nxt1
mov dl, 0
nxt1:
sub esi, 1
jnz @B

invoke GetTickCount
pop ecx
sub eax, ecx

print str$(eax)," load AH",13,10

; -----------------------------------------------------------

sub edi, 1
jnz lpstart

pop edi
pop esi
pop ebx

ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start

jj2007 · February 16, 2017, 09:33:02 AM

Interesting :t

I found my example a bit more relevant for practical purposes, but no problem, you found a case where mov ah is slow, congrats :icon14:

Btw how do the timings change with an align 4 before the two loops?

The MASM Forum

News:

"Hello masm32", not a BOT, new member

mineiro

FORTRANS

hutch--

LordAdef

jj2007

mineiro

hutch--

jj2007

hutch--

jj2007

hutch--

jj2007

LordAdef

hutch--

jj2007