News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Need some clarification on basic MASM concepts

Started by masterori, February 07, 2016, 04:50:39 PM

Previous topic - Next topic

masterori

I'm reviewing for my midterm right now, and I'm a bit confused with some basic concepts, so I was hoping someone can clarify for me?

1. please see code example from book

    .data
    sum  dword  0
    .code
    mov  eax  5
    add  eax,6
    mov  sum,eax

Why did the book use the "dword" directive? The sum total is 11, maybe I haven't clearly understood how size works, but shouldn't "byte" or "word" be sufficient? Since 11 can fit into 8 bits: 00001011

2. Are spaces counted in terms of a space taken in memory? E.g

    text  byte  "Hello there",0

Is the number of bytes 12 or 11?

3. If I want to store the ASCII value of an unsigned binary in the runtime stack, can I just push the binary surrounded by single quotes?
E.g: The value 0b1001011001101011 is 38507, so if I want to push the ASCII of 3,8,5,0,7 onto the runtime stack, can I just divide 1001011001101011 by 10000, 1000, 100, 10, 1 and then push the quotient onto stack? Or will I have to convert that unsigned binary into a decimal value first?

jj2007

Quote from: masterori on February 07, 2016, 04:50:39 PM1. Why did the book use the "dword" directive? The sum total is 11, maybe I haven't clearly understood how size works, but shouldn't "byte" or "word" be sufficient? Since 11 can fit into 8 bits: 00001011

It can fit into 8 bits, but the "natural" size for x86 code is 32 bits, so by default, one would use dword. Costs a bit more memory, but often it is faster, and there is more than enough memory around ;-)

Quote2. Are spaces counted in terms of a space taken in memory?

Yes, the processor couldn't care less if you've stored " ", "X" or a nullbyte.

Quote3. If I want to store the ASCII value of an unsigned binary in the runtime stack, can I just push the binary surrounded by single quotes?
E.g: The value 0b1001011001101011 is 38507, so if I want to push the ASCII of 3,8,5,0,7 onto the runtime stack, can I just divide 1001011001101011 by 10000, 1000, 100, 10, 1 and then push the quotient onto stack? Or will I have to convert that unsigned binary into a decimal value first?

That dividing method won't work - no need to do such acrobatics. Instead, you can use
push 1001011001101011b
push 1001011001101011y
push 38507
push 966Bh
push "–k"


It's all the same value, just expressed in a different syntax.

You should test what you are doing using either a debugger or the deb macro. See this thread for details.

For example, with the deb macro, you would typically do this:
  push 11111111 ; delimit your stack entries
  push 1001011001101011b
  push 1001011001101011y
  push 30000+8000+500+7
  push 38507
  push 966Bh
  push "–k"
  push 22222222 ; delimit your stack entries
  deb 4, "Values on the stack", stack[0], stack[4], stack[8], stack[12], stack[16], stack[20], stack[24], stack[28]


and see this output:
Values on the stack
stack[0]        22222222
stack[4]        38507
stack[8]        38507
stack[12]       38507
stack[16]       38507
stack[20]       38507
stack[24]       38507
stack[28]       11111111

dedndave

it is simpler to push the binary value and convert it to ascii decimal when required

however, strings can be pushed onto the stack
pushing an ascii decimal value is problematic, though
consider
0000 0000 0000 0000 0000 0000 0000 0001  "1" - 1 byte string + null terminator
0111 1111 1111 1111 1111 1111 1111 1111  "2147483647" - 10 byte string + null terminator


and, for 32-bit code, the stack should always be 4-aligned
so, regardless of the length of the string, you will probably want to push 3 dwords

masterori

But what if I want to put the individual ASCII of the 38507 in the stack? Like so:

stack[0] = ; ascii of 3
stack[4] = ; ascii of 8
...

That's the reason I was asking if it would be okay to divide that unsigned binary by 10000,1000, etc and then somehow get the ascii value of the quotient in eax.

If I just push the entire binary value (0b1001011001101011) onto the stack, then it wouldn't have the ascii of each individual digit, or any ascii char at all right?

dedndave

division is naturally one of the slower operations to perform
and, while it is possible to perform division on ASCII decimal strings,
it is faster to perform division on binary values, then convert the results to ASCII

this is the general concept for most math with computers
use human native ASCII decimal for input and output,
but perform the intermediate math steps in the CPU native binary format

using the stack for storage is another concept, altogether
best to learn one, then the other
trying to learn both at once is just making it harder on yourself

qWord

Quote from: masterori on February 08, 2016, 12:44:46 PM
But what if I want to put the individual ASCII of the 38507 in the stack? Like so:

stack[0] = ; ascii of 3
stack[4] = ; ascii of 8
...

You can get that by repeated division by 10 until the quotient gets zero:
.const
ten DWORD 10
.code

mov eax,12456

xor ecx,ecx
@1: xor edx,edx
div ten
add edx,'0'
push edx
add ecx,1
cmp eax,0
jnz @1
; ECX = count digits
; ASCII-digits in [ESP] ... [ESP+ECX*4-4]

; output example
@2: pop edx
push ecx
fn crt_putchar,edx
pop ecx
sub ecx,1
jnz @2
MREAL macros - when you need floating point arithmetic while assembling!

masterori

#6
I guess I'm not being clear on my question. I was wondering, if I push like say '0010' vs 0010, will it put '51' on the stack? If so for which - '0010' or 0010?

Another question as I was going through some book examples, it's regarding the 'imul' instruction

mov al, 48
mov bl, 4
imul bl            ; AX = 00c0h, Overflow = 1 because AH is not a sign extension of AL



mov ax,48
mov bx, 4
imul bx          ; DX:AX = 000000c0h, Overflow = 0 because DX is sign extension of AX


Can someone explain why the first example is not a sign extension? The book says "Because AH is not a sign extension of AL", but I'm not sure why...

hutch--

AL and AH are respecively the LOW byte and the HIGH byte of the AX register which is the LOW word of the EAX register. The difference between signed and unsigned is not how its stored in memory, it is how you evaluate it AFTER the value is in a register or memory. JG is a signed instruction, JA is unsigned. After a CMP you use a conditional jump to branch according to its value.

dedndave

Quote from: masterori on February 11, 2016, 04:19:24 PM
I guess I'm not being clear on my question. I was wondering, if I push like say '0010' vs 0010, will it put '51' on the stack? If so for which - '0010' or 0010?

    push    "0010"     ;string operands are reversed, so 30303130h is pushed (as a string, "0100")
    push    0010       ;the masm default radix is 10 (decimal), so 0000000Ah is pushed
    push    0010h      ;00000010h is pushed


usually, strings need to be null-terminated
if you want to push a literal string onto the stack, you should take that into account

    push    "gfe"      ;the last byte (highest address) pushed will be 0
    push    "dcba"     ;you now have the string "abcdefg",0 on the stack

dedndave

as Hutch mentioned, the difference between signed values and unsigned values is how they are interpreted
i.e., the context is up to the programmer

as an example, let's say that EAX holds the value 00303030h

    mov     eax,303030h

if we want that to represent a null-terminated string, it is "000",0
if we want that to represent a signed integer, it is +3158064 decimal
if we want that to represent an unsigned integer, it is 3158064

unsigned dword integers have the range of 0 to 4294967295 decimal
signed dword integers have the range of -2147483648 to +2147483647

unsigned word integers have the range of 0 to 65535
signed word integers have the range of -32768 to +32767

unsigned byte integers have the range of 0 to 255
signed byte integers have the range of -128 to +127

simply, if the value is signed, and the high bit is 0, it is positive
if the value is signed and the high bit is 1, it is negative, and the order of count is reversed (two's compliment)

0FFFFFFFFh may be treated as -1 (signed), or 4294967295 (unsigned)
080000000h may be treated as -2147483648 (signed), or 2147483648 (unsigned)

dedndave

the real trick to signed versus signed is how we treat them in our code
basically, we need to use the correct routine to convert integers to decimal strings (and vica versa)
AND, we need to use the right set of conditional branch instructions when dealing with signed/unsigned values

notice that signed and unsigned branches act on different flags in the EFlags register
a SUB instruction will yield the same result whether the values are signed or unsigned
however, a different set of flags might be used to branch
this is the elegance of the two's compliment numbering system

------------------------------------------------------------------------
Group     Instruction  Description               Condition       Aliases
------------------------------------------------------------------------
Equality  JZ           Jump if equal             ZF=1            JE
          JNZ          Jump if not equal         ZF=0            JNE

Unsigned  JA           Jump if above             CF=0 and ZF=0   JNBE
          JAE          Jump if above or equal    CF=0            JNC JNB
          JB           Jump if below             CF=1            JC JNAE
          JBE          Jump if below or equal    CF=1 or ZF=1    JNA

Signed    JG           Jump if greater           SF=OF or ZF=0   JNLE
          JGE          Jump if greater or equal  SF=OF           JNL
          JL           Jump if less              SF<>OF          JNGE
          JLE          Jump if less or equal     SF<>OF or ZF=1  JNG
          JO           Jump if overflow          OF=1
          JNO          Jump if no overflow       OF=0
          JS           Jump if sign              SF=1
          JNS          Jump if no sign           SF=0

Parity    JP           Jump if parity            PF=1            JPE
          JNP          Jump if no parity         PF=0            JPO

dedndave

#11
here, we see two's compliment at work
the CPU doesn't care whether the values are signed or unsigned
it performs the same operation, either way

        mov     eax,0FFFFFFFFh   ;signed value = -1, unsigned value = 4294967295
        mov     edx,000000100h   ;signed value = +256, unsigned value = 256
        sub     eax,edx          ;result = 0FFFFFEFFh, signed = -257, unsigned = 4294967039, EFlags = 00000286h

        mov     eax,0FFFFFFFFh   ;signed value = -1, unsigned value = 4294967295
        mov     edx,000000100h   ;signed value = +256, unsigned value = 256
        add     eax,edx          ;result = 0FFh, signed = +255, unsigned = 255 (with carry), EFlags = 00000207h


the EFLags register may be evaluated as follows:
EFlags Register
   Bit      Description
  31-22      unassigned, always 0 (as of Pentium IV)
   21        ID (Identification, may be toggled if CPUID supported)
   20        VIP (Virtual Interrupt Pending)
   19        VIF (Virtual Interrupt Flag)
   18        AC (Alignment Check, may be toggled if 486 or later)
   17        VM (Virtual 8086 Mode)
   16        RF (Resume Flag)
   15        unassigned, always 0 (as of Pentium IV)
   14        NT (Nested Task)
  13,12      IOPL (I/O Privilege Level)
   11        OF (Overflow Flag)
   10        DF (Direction Flag, 0 = up)
    9        IF (Interrupt Flag)
    8        TF (Trap Flag)
    7        SF (Sign Flag)
    6        ZF (Zero Flag)
    5        unassigned, always 0 (as of Pentium IV)
    4        AF (Auxiliary Flag)
    3        unassigned, always 0 (as of Pentium IV)
    2        PF (Parity Flag, 0 = odd parity)
    1        unassigned, always 1 (as of Pentium IV)
    0        CF (Carry Flag)


you are primarily interested in the math flags: OF, SF, ZF, CF

masterori

So based on dedndave's answer, it seems I have to convert the binary to ascii first then push? How can I do so such that the binary: 0b1001011001101011 (in dec: 38507) is stored like this in the runtime stack:


stack[0]    55    ; ascii of 7
stack[1]    48    ; ascii of 0
stack[2]    53    ; ascii of 5
stack[3]    56    ; ascii of 8
stack[4]    51    ; ascii of 3

dedndave

the assembler is perfectly capable of converting ASCII, decimal, hexadecimal, octal, etc to binary values
that means you can type

    mov     eax,"abcd"
    mov     eax,61626364h
    mov     eax,1633837924
    mov     eax,1100001011000100110001101100100b


they are all the same
you use the one that makes your code easy to understand

also, i will say this about PUSH and POP
in 32-bit code, the stack should always be 4-byte aligned
that means we don't generally push or pop bytes or words - always dwords

masterori

So they're all stored as ascii in the runtime stack by default? The reason I'm so persistent on knowing how to store the values as ascii on the stack, is because that's one of the review questions. Write code so that once it's done, the ascii values of the binary is in the runtime stack.