I need help understanding how the MOVZX instruction works

RedSkeleton007 · July 07, 2015, 07:20:20 AM

As some of you might already know, there are 3 different kinds of general purpose registers, 8-bit (reg8), 16-bit (reg16), and 32-bit (reg32). In my text book, they are displayed in a table, in the exact order as follows:

reg8: AH, AL, BH, BL, CH, CL, DH, DL

reg16: AX, BX, CX, DX, SI, DI, SP, BP

reg32: EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP

Are those listed in that exact order on purpose, or was the book just randomly listing them that way?

With that all said, the following examples use registers for all operands, showing all the size variations:

mov bx, 0A69Bh
movzx eax, bx ;EAX = 0000A69Bh
movzx edx, bl ;EDX = 0000009Bh <--Why didn't we just use the next register EBX? Is EBX not empty? And what happend to A6?
movzx cx, bl ;CX = 009Bh

jj2007 · July 07, 2015, 07:30:25 AM

movzx expands a byte or word to a full dword:

movzx ecx, ax ->contents of ax will be in cx now, higher bytes of ecx are zeroed
movzx edx, bl ->contents of bl will be in dl now, the three highest bytes of edx are zeroed
movzx eax, byte ptr SomeGlobalDword ->contents of lowest byte will be in al now, higher bytes are zeroed

movsx ecx, al ->contents of al will be in cl now, the three highest bytes of ecx are zeroed or -1, depending on sign of al

Nice exercise for the deb macro btw.

rrr314159 · July 07, 2015, 10:35:41 AM

Quote from: RedSkeleton007Are those listed in that exact order on purpose

- The list is "on purpose" - that's the normal order, altho I have seen others. It makes some sense; for instance, ax can be considered the "first" register, historically and functionally. Note that the first 4 are in alpha order, of course. But it's mainly just conventional.

K_F · July 07, 2015, 04:36:03 PM

The order is not that important, but maybe it encourages you to use the registers from left to right.
You might find that the order represents the usage rate, but all/most of the registers are interchangeable,..

... except ESP and EBP - you don't want to fiddle with these two to much as incorrect fiddling cause crashes .. if not explosions

Some registers are implicitly used in certain instructions - Intel Instruction Set manual is your friend.

Example of Register sizes:-
This is a legacy structure from the early 8088 processors and kept for compatibility purposes and other issues.

EAX -> Full 32 bit register
AX -> The lower 16 bits of EAX (there is no upper 16 bit register)
AH -> The upper 8 bits of AX
AL -> The lower 8 bits of AX

rrr314159 · July 08, 2015, 01:51:14 AM

This is not very important, but talking about register order, it's interesting to note that the 8 registers are encoded in Intel instructions in 3 bits, 0 .. 7, and the order is different. The "real" order, if you want to call it that, is eax, ecx, edx, ebx, esp, ebp, esi, edi. Thus eax is encoded as 0, edi as 7.

Sometimes, in a book / tute, I've seen the first four listed in that order - a, c, d, b; but the last 4 always in the order your book gives (esi, edi, ebp, esp). The a, c, d, b order makes some sense functionally: eax is the "most volatile" and ebx the "least volatile", of those 4. As K_F says each register has some special role in a few instructions (especially esp) and it's worthwhile learning what those are

Tedd · July 08, 2015, 02:11:00 AM

Quote from: RedSkeleton007 on July 07, 2015, 07:20:20 AM
As some of you might already know, there are 3 different kinds of general purpose registers, 8-bit (reg8), 16-bit (reg16), and 32-bit (reg32). In my text book, they are displayed in a table, in the exact order as follows:

reg8: AH, AL, BH, BL, CH, CL, DH, DL
reg16: AX, BX, CX, DX, SI, DI, SP, BP
reg32: EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP

Keep in mind that the reg16s are the lower 16 bits of the reg32s (with the matching names), and the reg8s are the upper (H) and lower (L) 8 bits of the reg16s (also, therefore, part of the reg32s.) So EAX can be seen as: EAX which is 32 bits, or as AX which is 16 bits (the upper 16 bits are ignored), or as AL which is 8 bits (the upper 24 bits are ignored), or as AH which is also 8 bits (the upper 16 bits and the lower 8 bits are ignored.) Note, then, that changing the value of AL will also change the values of AX and EAX because they are all parts of the same thing. {AL = register 'A' Lower part; AH = register 'A' Higher part; AX = register 'A' eXtended part; EAX = Extended register 'A' eXtended part.}

Quote
Are those listed in that exact order on purpose, or was the book just randomly listing them that way?

They're alphabetic(ish).
Registers A B C D are the general purpose registers - use them for anything, in any order, as you decide.
SI and DI were intended to be used as index registers, e.g. for indexing into arrays.
SP is the stack pointer which gives the address of the top of the stack.
BP is the base-pointer, meant for holding the base of the stack-frame (used for referencing local variables and function parameters.)

Quote
mov bx, 0A69Bh
movzx eax, bx ;EAX = 0000A69Bh
movzx edx, bl ;EDX = 0000009Bh <--Why didn't we just use the next register EBX? Is EBX not empty? And what happend to A6?
movzx cx, bl ;CX = 009Bh

The first line says BX = A69B, therefore: EBX = ????A69B; BH = A6; BL = 9B.
The second line copies BX into EAX; but BX is 16 bits and EAX is 32 bits - what about the extra 16 bits in EAX? MOVZX produces zero-extension, which fills the rest with zeroes. So the lower 16 bits of EAX become A69B, and the upper 16 bits are zeroed: EAX = 0000A69B.
The third line copies BL into EDX. So that's 8 bits into 32 bits, and zero the rest. So the lower 8 bits become 9B (the value of BL), and the upper bits are zeroed: EDX = 0000009B.
And the fourth line copies BL into CX. 8 bits into 16 bits. CX = 009B.

Hopefully, now you should be able to answer your own question: Why didn't we just use the next register EBX? Is EBX not empty? And what happend to A6?

MichaelW · July 08, 2015, 03:48:37 PM

Quote from: rrr314159 on July 08, 2015, 01:51:14 AM
The "real" order, if you want to call it that, is eax, ecx, edx, ebx, esp, ebp, esi, edi. Thus eax is encoded as 0, edi as 7.

Which matches the EAX, ECX, EDX, EBX, ESP (original value), EBP, ESI, and EDI order that the Intel documentation shows for PUSHAD.

AssemblyChallenge · July 09, 2015, 02:59:30 AM

As they all explained very well how it works, here is some more general info:

In Assembly you MUST use mem/reg with the same size (ie: 32 with 32, 8 with 8, etc). Yet, there are cases when you need to transfer from a small register into a bigger one, like the examples, say, mov AL's value into ECX:

You cannot jump and do this:

Code Select


   mov ecx, al ; compile will fail because of operand's sizes

In order to achieve it, you must do several intermediate steps, which make this simple operation longer:

Code Select


   xor ecx, ecx ; assuming you dont care about cx, cl, etc.
   mov cl, al     ; transfer value: now ecx = al

Fortunately, Intel's engineers thought about this beforehand and created the marvelous MOVZX:

Code Select


  movzx ecx, al ; simpler, faster =)

RedSkeleton007 · July 10, 2015, 06:19:39 AM

Quote from: Tedd on July 08, 2015, 02:11:00 AM
The first line says BX = A69B, therefore: EBX = ????A69B; BH = A6; BL = 9B.
The second line copies BX into EAX; but BX is 16 bits and EAX is 32 bits - what about the extra 16 bits in EAX? MOVZX produces zero-extension, which fills the rest with zeroes. So the lower 16 bits of EAX become A69B, and the upper 16 bits are zeroed: EAX = 0000A69B.
The third line copies BL into EDX. So that's 8 bits into 32 bits, and zero the rest. So the lower 8 bits become 9B (the value of BL), and the upper bits are zeroed: EDX = 0000009B.
And the fourth line copies BL into CX. 8 bits into 16 bits. CX = 009B.

Thanks Tedd. That really helps me a lot actually ;) Now I understand how the bit transferring works. However, one thing I still don't understand is the registers used in lines 3 and 4:

The following examples use registers for all operands, showing all the size variations:

mov bx, 0A69Bh
movzx eax, bx ;EAX = 0000A69Bh
movzx edx, bl ;EDX = 0000009Bh <--since edx is the I/O register, I would think we would use it to output the value in bl,
movzx cx, bl ;CX = 009Bh <--yet we're moving it to the 16-bit portion of the ecx register (the loop counter register) instead.

Why then, would we want to use the EDX and ECX registers (especially the ecx register, since we're not even looping with this set of instructions)? Also, how does all this bit shifting (8-bit and 16-bit chopping and butchering) work in MASM without messing up the roles of the 32-bit general purpose registers?

hutch-- · July 10, 2015, 12:52:15 PM

Red,

As of 32 bit processors you are not limited to which register you use for what purpose. ESP and EBP are special purpose if you use a stack frame but you can use the rest any way you like. In the 16 bit era ECX was often used as the counter but in general purpose code you can routinely use any of the other registers. The exceptions are the old string instructions, MOVS SCAN STOS which use specific registers to perform their tasks, generally ESI EDI and ECX. 32 bit instructions have less restrictions that older 16 but hardware allowed and if you get into 64 bit, you have a lot more registers to work with.

jj2007 · July 10, 2015, 09:56:42 PM

A few lines for the fans of Olly. The instruction sizes correspond to the specific purposes of these registers.

Code Select

	mov eax, [ebp+120]
	mov eax, [esp+120]
	mov eax, [ebp+8*edx+120]
	mov eax, [esp+8*edx+120]
	mov eax, [ebp+8*edx]
	mov eax, [esp+8*edx]

Tedd · July 11, 2015, 02:23:45 AM

Quote from: RedSkeleton007 on July 10, 2015, 06:19:39 AM
mov bx, 0A69Bh
movzx eax, bx ;EAX = 0000A69Bh
movzx edx, bl ;EDX = 0000009Bh <--since edx is the I/O register, I would think we would use it to output the value in bl,
movzx cx, bl ;CX = 009Bh <--yet we're moving it to the 16-bit portion of the ecx register (the loop counter register) instead.

Why then, would we want to use the EDX and ECX registers (especially the ecx register, since we're not even looping with this set of instructions)? Also, how does all this bit shifting (8-bit and 16-bit chopping and butchering) work in MASM without messing up the roles of the 32-bit general purpose registers?

Quote from: Tedd on July 08, 2015, 02:11:00 AM
Registers A B C D are the general purpose registers - use them for anything, in any order, as you decide.

There are not nearly enough registers to reserve each one for a specific purpose. Consider the 'defined roles' to be suggestions - being consistent helps make code easier to understand, but they are still suggestions and can't be enforced, nor should they (there were certain instruction limitations with 16-bit, but these have largely been removed for 32-bit.)
The general purpose registers, as the name implies, can be used for whatever purpose you require at the time, and another purpose a different time - there are no fixed roles (other than esp and eip). Some instructions do make use of specific registers, but outside of these you can use any register for any purpose (though not all instructions support usage of all registers.)

Go crazy - use EAX as a counter, and EDX as an accumulator, and ECX as a pointer; because you can.

jj2007 · July 11, 2015, 03:03:59 AM

You could even declare esi and edi as your "stack pointers", and use stosd and lodsd instead of push and pop. Definitely clumsy but there is no law that forbids to do so.

The MASM Forum

News:

I need help understanding how the MOVZX instruction works

RedSkeleton007

jj2007

rrr314159

K_F

rrr314159

Tedd

MichaelW

AssemblyChallenge

RedSkeleton007

hutch--

jj2007

Tedd

jj2007